Merging two IEnumerable<T>s - c#

I have two IEnumerable<T>s.
One gets filled with the fallback ellements. This one will always contain the most elements.
The other one will get filled depending on some parameters and will possibly contain less elements.
If an element doesn't exist in the second one, I need to fill it with the equivalent one of the first one.
This code does the job, but feels inefficient to me and requires me to cast the IEnumerables to ILists or to use a temporary list
Person implements IEquatable
IEnumerable<Person> fallBack = Repository.GetPersons();
IList<Person> translated = Repository.GetPersons(language).ToList();
foreach (Person person in fallBack)
{
if (!translated.Any(p=>p.equals(person)))
translated.add(person);
}
Any suggestions?

translated.Union(fallback)
or (if Person doesn't implement IEquatable<Person> by ID)
translated.Union(fallback, PersonComparer.Instance)
where PersonComparer is:
public class PersonComparer : IEqualityComparer<Person>
{
public static readonly PersonComparer Instance = new PersonComparer();
// We don't need any more instances
private PersonComparer() {}
public int GetHashCode(Person p)
{
return p.id;
}
public bool Equals(Person p1, Person p2)
{
if (Object.ReferenceEquals(p1, p2))
{
return true;
}
if (Object.ReferenceEquals(p1, null) ||
Object.ReferenceEquals(p2, null))
{
return false;
}
return p1.id == p2.id;
}
}

Try this.
public static IEnumerable<Person> SmartCombine(IEnumerable<Person> fallback, IEnumerable<Person> translated) {
return translated.Concat(fallback.Where(p => !translated.Any(x => x.id.equals(p.id)));
}

use Concat. Union does not work in case List<dynamic> type

Related

mutiple sorting rule with IComparable and IComparer?

I'm new to C#, just a question on how to use IComparable and IComparer properly. Let's say we have the following class:
public class Student
{
int score;
string name;
}
and I want to sort by socre (desc) first then sort by name (asc).
Now assume that I can't access to the Student class, so I can only use IComparer
so I have to make one help class(let's say it is called StudentComparer) and put the same logic into
public class StudentComparer: IComparer
{
int Compare(object o1, object o2)
{
Student s1 = o1 as Student;
Student s2 = o2 as Student;
// not checking null for simplicity
if (s1.score == s2.score)
return String.Compare(s1.name, s2.name);
else if (s1.score < s2.score)
return -1
else
return 1
}
}
Here is my question, if I just need to use a single rule later, for example, sometimes just sort by name and sometimes just sort by score. So I have to make another two help classes(ScoreComparer and NameComparer) that implement IComparer, which have duplicated code in StudentComparer as
public class ScoreComparer : IComparer
{
int Compare(object o1, object o2)
{
//half logic of StudentComparer
}
}
public class NameComparer: IComparer
{
int Compare(object o1, object o2)
{
//half logic of StudentComparer
}
}
my case is pretty simple, image that if there is a complicated case, each comparer consist of hundreds line of code,so how can I avoid duplicated code? or is it a way to combine multiple Comparer A, B, C, D ... into a common Comparer so that it check A,B,C,D in sequence, just like order by clause in SQL
Back in the days before linq, I used to use comparers a lot.
The way I've handled different sort options was by specifying these options in the constructor of the comparer implementation and then use that information in the Compare method.
Lucky for us, we now have linq so this entire thing can be done with a single, fluent line of code:
// sortedStudents is an IEnumerable<Student> sorted by score and name.
var sortedStudents = students.OrderBy(s => s.Score).ThenBy(s => s.Name);
However, if for some reason you need to work the old-fashion way, using comparers and stuff like that, here is how I would handle this back then:
internal enum CompareBy
{
NameOnly,
ScoreAndName
}
public class StudentComparer: IComparer<Student>
{
private CompareBy _compareBy
public StudentComparer(CompareBy compareBy)
{
_compareBy = compareBy;
}
public int Compare(Student s1, Student s2)
{
// not checking null for simplicity
var nameCompare = string.Compare(s1.name, s2.name);
if(_compareBy == NameOnly)
{
return nameCompare;
}
// since there are only two members in the enum it's safe to write it like this.
// if the enum grows, you must change the code.
if (s1.score == s2.score)
{
return nameCompare;
}
else if (s1.score < s2.score)
{
return -1
}
return 1
}
}

How can I efficiently compare the properties of two large lists of objects in C#?

I have a dataset of two lists of objects, which has an ID that will be consistent in both lists but other properties that may or may not be different. How can I most efficiently retrieve the ones that are different based on one or more properties?
My usual approach has been something along the lines of this. My object is set up like:
public class Person
{
public int ID { get; set; }
public string Name { get; set; }
public int Age { get; set; }
public bool IsEqual(Person other)
{
if (Name != other.Name)
{
return false;
}
if (Age != other.Age)
{
return false;
}
return true;
}
}
Where the IsEqual comparator is used to compare it to some equivalent object.
And then my method for finding modified people is like:
public static List<Person> FindModifiedPeople(List<Person> listA, List<Person> listB)
{
var modifiedPeople = new List<Person>();
foreach (var personA in listA)
{
var matchingPerson = listB.FirstOrDefault(e => e.ID == personA.ID);
if (matchingPerson == null)
{
continue;
}
if (!personA.IsEqual(matchingPerson))
{
modifiedPeople.Add(personA);
}
}
return modifiedPeople;
}
In my dataset, I don't care about people that are in listB but not listA, so I don't need to loop through both lists. I only need to check listA for the element in listB (that may or may not be there) and return a list of people that have been modified (with the elements from listA).
This approach worked fine for reasonably small lists, but now I have two lists with about 160,000 people and this approach takes several minutes. Is there any way to make this method more efficient while still returning what I need it do?
If you can change your lists to be a Dictionary<int, Person> with the person's ID as the key they this would work for you. This will run in O(n) as opposed to your O(n^2).
public static List<Person> FindModifiedPeople(Dictionary<int, Person> dictA, Dictionary<int, Person> dictB)
{
var modifiedPeople = new List<Person>();
foreach (var personA in dictA)
{
Person matchingPerson;
if(dictB.TryGetValue(personA.Key, out matchingPerson))
{
if (!personA.Value.IsEqual(matchingPerson))
{
modifiedPeople.Add(personA.Value);
}
}
}
return modifiedPeople;
}
You could also change the return type from List to another Dictionary as well depending on what you need it for.
EDIT
As #maccettura pointed out in his comment, you really should override the built in equals method. That would make your code look something like this.
public override bool Equals(Object obj)
{
if (obj == null || GetType() != obj.GetType())
return false;
var otherPerson = (Person)obj;
if (Name != otherPerson.Name)
{
return false;
}
if (Age != otherPerson.Age)
{
return false;
}
return true;
}
This will allow your code to work with any stuff that is expecting to use the default Equals method as opposed to your custom one.
Are you sure that the comparison is the bottleneck? I think that the problem comes form the search you do in this line:
var matchingPerson = listB.FirstOrDefault(e => e.ID == personA.ID);
There, you are doing a search with a logartihmic complexity of O(n), which coupled with the foreach loop gives a total complexity of O(n^2). Instead, you could create a dictionary upfront, which takes some time, but in which lookups are much faster. The dictionary should have the ID as keys, and can be easily created like this BEFORE THE foreach LOOP:
var dictB = listB.ToDictionary(p => p.ID);
After that, your lookup would be much faster, like this:
Person matchingPerson;
if (dictB.TryGetValue(personA.ID, out matchingPerson))
{
if (!personA.IsEqual(matchingPerson))
{
modifiedPeople.Add(personA);
}
}

Can I use LINQ to check if objects in a list have a unique ID?

say I have a list containing objects like this one:
public class Person
{
private string _name;
private string _id;
private int _age;
public Person
{
}
// Accessors
}
public class ManipulatePerson
{
Person person = new Person();
List<Person> personList = new List<Person>;
// Assign values
private void PopulateList();
{
// Loop
personList.Add(person);
// Check if every Person has a unique ID
}
}
and I wanted to check that each Person had a unique ID. I would like to return a boolean true/false depending on whether or not the IDs are unique. Is this something I can achieve with LINQ?
Note that you can even leverage directly an HashSet<>:
var hs = new HashSet<string>();
bool areAllPeopleUnique = personList.All(x => hs.Add(x.Id));
(and is the code that I normally use)
It has the advantage that on the best case (presence of some duplicates) it will stop before analyzing all the personList collection.
I would use Distinct and then check against the counts for example:
bool bAreAllPeopleUnique = (personList.Distinct(p => p.ID).Count == personList.Count);
However as #Ian commented you will need to add a property to the Person class so that you can access the Id like so:
public string ID
{
get { return _id; }
}
A 'nicer' way to implement this would be to add a method like so:
private bool AreAllPeopleUnique(IEnumerable<Person> people)
{
return (personList.Distinct(p => p.ID).Count == personList.Count);
}
NOTE: The method takes in an IEnumerable not a list so that any class implementing that interface can use the method.
One of best ways to do so is overriding Equals and GetHashCode, and implementing IEquatable<T>:
public class Person : IEquatable<Person>
{
public string Id { get; set; }
public override bool Equals(object some) => Equals(some as Person);
public override bool GetHashCode() => Id != null ? Id.GetHashCode() : 0;
public bool Equals(Person person) => person != null && person.UniqueId == UniqueId;
}
Now you can use HashSet<T> to store unique objects and it will be impossible that you store duplicates. And, in addition, if you try to add a duplicated item, Add will return false.
NOTE: My IEquatable<T>, and Equals/GetHashCode overrides are very basic, but this sample implementation should give you a good hint on how to elegantly handle your scenario.
You can check this Q&A to get an idea on how to implement GetHashCode What is the best algorithm for an overridden System.Object.GetHashCode?
Maybe this other Q&A might be interesitng for you: Why is it important to override GetHashCode when Equals method is overridden?
You can use GroupBy for getting unique items:
var result = personList.GroupBy(p=> p.Id)
.Select(grp => grp.First())
.ToList();

Select a Collection with same interface

If have following classen
public interface ISomething { int Id { get; set; } }
public class SomethingA : ISomething {...}
public class SomethingB : ISomething {...}
In another class I have following two lists:
List<SomethingA> aValues;
List<SomethingB> bValues;
My question is if there is a possibility to do something like this:
public List<ISomething> GetList(bool select) {
return select ? aValues : bValues;
}
My goal is to use this as this:
GetList(true).Single(x => x.Id) // or
foreach (var value in GetList(false))
{
value.Id = 18;
}
// anything else
UPDATE:
I see, there are good possibilities. But is there also a way to also achieve the following?
GetList(true).Remove(myValue);
You can't return List<ISomething> because List<T> is not covariant and classes can't be. IEnumerable<T> is covariant, you may use it as readonly sequence.
Change the method to return IEnumerable<ISomething>
public static IEnumerable<ISomething> GetList(bool select)
{
return select ? (IEnumerable<ISomething>)aValues :bValues;
}
Then do
var result = GetList(true).Single(x => x.Id == 0);
foreach (var value in GetList(false))
{
value.Id = 18;
}
As for your update: If you like to remove the item you need to lose some flexibility. I.e Use non generic IList as the return type.
public static IList GetList(bool select)
{
return select ? (IList)aValues : bValues;
}
Then do
IList list = GetList(true);
foreach (var value in list.OfType<ISomething>())//OfType or Cast can be used
{
if (value.Id == 6)//Whatever condition
{
list.Remove(value);
break;
}
}
I like the OfType extension because it returns the typed list you need
var listA = initialList.OfType<TypeA>(); //return List<TypeA>
var listB = initialList.OfType<TypeB>(); //return List<TypeB>
So in your case you start with
var aValues = List<ISomething>.OfType<SomethingA>()
and then you can iterate on whichever subcollection you need. Of course you are then working with a IEnumerable, but that can be converted implicitly back to a IEnumerable<ITest>.
If you want to filter out values, I would create explicit methods to remove them but it depends on what you need to achieve in the end (for example comparing on a Id instead of the whole object):
public IEnumerable<T> Remove<T>(this List<IDisposable> values, T valueToRemove) where T: IComparable
{
return values.OfType<T>().Where(t => valueToRemove.CompareTo(t) != 0);
}
The simplest solution may be using Linq Cast() like this:
public List<ISomething> GetList(bool select)
{
return (List<ISomething>)(#select ? aValues.Cast<ISomething>() : bValues.Cast<ISomething>());
}
I see, there are good possibilities. But is there also a way to also achieve the following?
GetList(true).Remove(myValue);
To remove from the original lists, you are likely best of with a specialized Remove method on the class in question as others have suggested, as most solutions here return a copy of the original list.
You may remove the element from a copy of the list quite easily like so, but I understand that's not what you are asking.
var result = GetList(true);
result.Remove(myValue);
You can either use the .Cast<T> method like this:
if (select)
{
return aValues.Cast<ISomething>().ToList();
}
else
{
return bValues.Cast<ISomething>().ToList();
}
or add all items to a commong Lis() like this:
var ret = new List<ISomething>();
if (select)
{
ret.AddRange(aValues);
}
else
{
ret.AddRange(bValues);
}
return ret;
Since you only want to iterate it, I would write the method like this:
public IEnumerable<ISomething> GetList(bool select) {
return select ? aValues.Cast<ISomething>() : bValues.Cast<ISomething>();
}
You can also look at this StackOverflow question.

C# HashSet Generic allows duplicate

Reading HashSet on MSDN, it says with HashSet<T>, if T implements IEquatable<T> then the HashSet uses this for IEqualityComparer<T>.Default.
So, let the class Person:
public class Person : IEquality<Person>
{
private string pName;
public Person(string name){ pName=name; }
public string Name
{
get { return pName; }
set
{
if (pName.Equals(value, StringComparison.InvariantCultureIgnoreCase))
{
return;
}
pName = value;
}
}
public bool Equals(Person other)
{
if(other==null){return false;}
return pName.Equals(other.pName, StringComparison.InvariantCultureIgnoreCase);
}
public override bool Equals(object obj)
{
Person other = obj as Person;
if(other==null){return false;}
return Equals(other);
}
public override int GetHashCode(){return pName.GetHashCode();}
public override string ToString(){return pName;}
}
So, let's define in another class or main function:
HashSet<Person> set = new HashSet<Person>();
set.Add(new Person("Smith"); // return true
Person p = new Person("Smi");
set.Add(p); // return true
p.Name = "Smith"; // no error occurs
And now, you've got 2 Person objects in the HashSet with the same name (so that, there are "Equals").
HashSet let us put duplicate objects.
HashSet let us put duplicate objects.
It isn't letting you put in duplicate objects. The issue is that you're mutating the object after it's been added.
Mutating objects being used as keys in dictionaries or stored as hashes is always problematic, and something I would recommend avoiding.

Categories

Resources