What's the fastest way (coding wise) to check if one entry exist on a list?
MyObject has 2 properties
public class Name
{
public string FirstName{ get; set; }
public string LastName { get; set; }
}
then I have another class like this:
public class Foo
{
private List<Name> Names : new List<Name>();
public List<Name> Names { get; set; }
public bool Contains(Name x)
{
if (x == null)
return false;
>>> Navigate || Equals || Linq.Contains
>>> What's the easiest way to do this?
}
}
Fastest for List are O(n) lookup speed and O(1) insert speed:
Atleast One
Names.Any(n=> x.FirstName == n.FirstName && x.LastName == n.LastName)
Exactly One:
Names.Count(n=> x.FirstName == n.FirstName && x.LastName == n.LastName) == 1
Any() is faster because it short circuits when it finds the first instance of Name. Count searches through the list everytime to find all instances of Name.
Instead, you could use a Collection (e.g. HashSet, Dictionary, etc) where lookup operations are O(1). However, collections don't hold the same properties as Lists. Note, Hashset<string> where names are stored as something like FirstName + (delimeter) + LastName is faster than any other option you have.
You could also use a SortedList where lookup speeds are O(log(n)). However, inserting elements in a sorted list is O(nlog(n)) because you must keep the list sorted after every insertion.
I would say linq .Any is pretty easy
http://msdn.microsoft.com/en-us/library/system.linq.enumerable.any.aspx
Names.Any(n=> n==x)
Using Linq should be easier to read.
Here is sample using Any.
public bool Contains(Name x)
{
if (x == null)
return false;
return this.Names.Any(item => item.FirstName == x.FirstName && item.LastName == x.LastName);
}
Suggestion: If the items in your list are supposed to be unique then you could use System.Collections.Generic.HashSet and use System.Linq.Enumerable.Contains..
You might want to compare for the performance with the methods Contains and Any of the following code:
partial class Foo {
class NameComparer: IComparer<Name> {
public int Compare(Name x, Name y) {
return
object.ReferenceEquals(x, y)
||y.LastName==x.LastName&&y.FirstName==x.FirstName?0:~0;
}
public static readonly NameComparer Default=new NameComparer();
}
public bool Any(Name x) {
return
Names.Any(
y => object.ReferenceEquals(x, y)
||y.LastName==x.LastName&&y.FirstName==x.FirstName);
}
public bool Contains(Name x) {
return Names.BinarySearch(x, NameComparer.Default)>~0;
}
}
Related
I have one model class:
public class Person
{
public int Id { get; set; }
public string Name { get; set; }
}
When I adding two list with this:
List<Person> people1 = new List<Person> {
new Person() { Id = 1, Name = "Name1" },
new Person() { Id = 2, Name = "Name2" },
new Person() { Id = 3, Name = "Name3" },
};
List<Person> people2 = new List<Person> {
new Person() { Id = 1, Name = "Name1" },
new Person() { Id = 4, Name = "Name4" },
new Person() { Id = 5, Name = "Name5" },
};
people1.AddRange(people2);
If person in people2 has the same id in person in people1, I don't want it added. How can I do that?
You can use LINQ for this fairly easily but inefficiently:
people1.AddRange(people2.Where(p2 => !people1.Any(p1 => p1.Id == p2.Id)));
Or you could create a set of IDs first:
HashSet<int> people1Ids = new HashSet<int>(people1.Select(p1 => p1.Id));
people1.AddRange(people2.Where(p2 => !people1Ids.Contains(p2.id));
The first approach is obviously simpler, but if your lists get large, it could become slow, because for every element in people2, it'll look through every element in people1.
The second approach will be significantly faster if people1 is large. If it's people2 that's large instead, you won't get much benefit. For example, if people1 only contains a couple of people, then looking in a hash set for an ID won't be much faster than looking through the list.
You could take an entirely different approach though. If you make your Person type implement IEquatable<Person> based on ID - or create an IEqualityComparer<Person> that does so, and if you don't so much need the existing list to be modified, so much as you need "the union of the two lists", and if you don't care about the order, and if all the entries in each list are unique or you don't mind duplicates being removed, you could just use:
// Specify the comparer as another argument if you need to.
// You could call ToList on the result if you need a list.
var union = people1.Union(people2);
(That's a lot of conditions for that solution, but they may well all be valid.)
You can use the Union operator for this with a custom IEqualityComparer. This will create a new list which is a combination of the other 2. Implementing a customer IEqualityComparer gives you control over what constitutes the same record.
var allPeople = people1.Union(people2, new PersonComparer());
public class PersonComparer : IEqualityComparer<Person>
{
public bool Equals(Person x, Person y)
{
// ommited null checks etc
return x.Id == y.Id;
}
public int GetHashCode(Person obj)
{
// ommited null checks etc
return obj.Id.GetHashCode()
}
}
Use this:
people1.AddRange(people2.Except(people1));
But you first need to Override Equal and GetHashCode in Person class:
public class Person
{
public int Id { get; set; }
public string Name { get; set; }
public override bool Equals(object obj)
{
if (!(obj is Person))
return false;
Person p = (Person)obj;
return (p.Id == Id && p.Name == Name);
}
public override int GetHashCode()
{
return String.Format("{0}|{1}", Id, Name).GetHashCode();
}
}
Or you can use a a custom equality comparer and then use Distinct(new DistinctItemComparer()):
public class DistinctItemComparer : IEqualityComparer<Person>
{
public bool Equals(Person x, Person y)
{
return x.Id == y.Id &&
x.Name == y.Name;
}
public int GetHashCode(Person obj)
{
return obj.Id.GetHashCode() ^
obj.Name.GetHashCode();
}
}
Then use it like this:
people1.AddRange(people2.Except(people1, new DistinctItemComparer()));
If you just need to distinct based on Id you can excluse the Name from this two methods.
Based on this, The second approach seems better, as the Microsoft already suggested to Do not overload operator equals on reference types.
Have you thought about using a Dictionary instead?
You can use the Id as a Key and it won't allow duplicate Keys to exist?
The following code is a way I've used before:
var dictionary = people1.ToDictionary(x => x.Id, x => x);
foreach(var person in people2)
{
if(!dictionary.ContainsKey(item.Id))
{
dictionary.Add(item.Id, item);
}
}
There may be a better way of doing it but this has worked for me.
This way when you add an item to the dictionary it won't let you add something with the same Id.
Also check out HashSets as they do a similar thing.
I think good solution here is to use LINQ. Its quite simple and short code to write :
people1.AddRange(people2.Where(x => !people1.Any(y => x.Id == y.Id)));
I have a dataset of two lists of objects, which has an ID that will be consistent in both lists but other properties that may or may not be different. How can I most efficiently retrieve the ones that are different based on one or more properties?
My usual approach has been something along the lines of this. My object is set up like:
public class Person
{
public int ID { get; set; }
public string Name { get; set; }
public int Age { get; set; }
public bool IsEqual(Person other)
{
if (Name != other.Name)
{
return false;
}
if (Age != other.Age)
{
return false;
}
return true;
}
}
Where the IsEqual comparator is used to compare it to some equivalent object.
And then my method for finding modified people is like:
public static List<Person> FindModifiedPeople(List<Person> listA, List<Person> listB)
{
var modifiedPeople = new List<Person>();
foreach (var personA in listA)
{
var matchingPerson = listB.FirstOrDefault(e => e.ID == personA.ID);
if (matchingPerson == null)
{
continue;
}
if (!personA.IsEqual(matchingPerson))
{
modifiedPeople.Add(personA);
}
}
return modifiedPeople;
}
In my dataset, I don't care about people that are in listB but not listA, so I don't need to loop through both lists. I only need to check listA for the element in listB (that may or may not be there) and return a list of people that have been modified (with the elements from listA).
This approach worked fine for reasonably small lists, but now I have two lists with about 160,000 people and this approach takes several minutes. Is there any way to make this method more efficient while still returning what I need it do?
If you can change your lists to be a Dictionary<int, Person> with the person's ID as the key they this would work for you. This will run in O(n) as opposed to your O(n^2).
public static List<Person> FindModifiedPeople(Dictionary<int, Person> dictA, Dictionary<int, Person> dictB)
{
var modifiedPeople = new List<Person>();
foreach (var personA in dictA)
{
Person matchingPerson;
if(dictB.TryGetValue(personA.Key, out matchingPerson))
{
if (!personA.Value.IsEqual(matchingPerson))
{
modifiedPeople.Add(personA.Value);
}
}
}
return modifiedPeople;
}
You could also change the return type from List to another Dictionary as well depending on what you need it for.
EDIT
As #maccettura pointed out in his comment, you really should override the built in equals method. That would make your code look something like this.
public override bool Equals(Object obj)
{
if (obj == null || GetType() != obj.GetType())
return false;
var otherPerson = (Person)obj;
if (Name != otherPerson.Name)
{
return false;
}
if (Age != otherPerson.Age)
{
return false;
}
return true;
}
This will allow your code to work with any stuff that is expecting to use the default Equals method as opposed to your custom one.
Are you sure that the comparison is the bottleneck? I think that the problem comes form the search you do in this line:
var matchingPerson = listB.FirstOrDefault(e => e.ID == personA.ID);
There, you are doing a search with a logartihmic complexity of O(n), which coupled with the foreach loop gives a total complexity of O(n^2). Instead, you could create a dictionary upfront, which takes some time, but in which lookups are much faster. The dictionary should have the ID as keys, and can be easily created like this BEFORE THE foreach LOOP:
var dictB = listB.ToDictionary(p => p.ID);
After that, your lookup would be much faster, like this:
Person matchingPerson;
if (dictB.TryGetValue(personA.ID, out matchingPerson))
{
if (!personA.IsEqual(matchingPerson))
{
modifiedPeople.Add(personA);
}
}
how can i compare 2 lists and have the not matching items but according to the specifics properties
public partial class Cable : StateObject
{
public int Id { get; set; }
public int CablePropertyId { get; set; }
public int Item { get; set; }
public int TagNo { get; set; }
public string GeneralFormat { get; set; }
public string EndString { get; set; }
public string CableRevision { get; set; }
}
I want to comparision accomplished accoring to the CablePropertyId,TagNo and CableRevision, if i use
var diffCables = sourceCables.Except(destinationCables).ToList();
the whole properties are compared to each other . how can i do that?
Use Linq except method with custom EqualityComparer.
http://msdn.microsoft.com/en-us/library/bb336390(v=vs.110).aspx
class CableComparer : IEqualityComparer<Cable>
{
public bool Equals(Cable x, Cable y)
{
return (x.CablePropertyId == y.CablePropertyId && ...);
}
public int GetHashCode(Cable x) // If you won't create a valid GetHashCode based on values you compare on, Linq won't work properly
{
unchecked
{
int hash = 17;
hash = hash * 23 + x.CablePropertyID;
hash = hash * 23 + ...
}
return hash;
}
}
var diffCables = sourceCables.Except(destinationCables, new CableComparer());
Also, ToList() operation on the result isn't really necessary. Most of the time you can just operate on the result of Linq query IEnumerable without specifying the exact type; this way you won't waste performance on unneeded ToList() operation.
By the way, a couple of others proposed Where-based queries with simple lambda. Such solution is easier to read (in my opinion), but it's also less optimized: it forces n^2 checks, while IEqualityComparer allows Linq to be more optimal because of GetHashCode() method. Here's a great answer on importance of GetHashCode, and here's a great guide on writing GetHashCode() override.
You can create your own IEqualityComparer<Cable> like this:
public class CableComparer : IEqualityComparer<Cable>
{
public bool Equals(Cable x, Cable y)
{
return x.CablePropertyId == y.CablePropertyId &&
x.TagNo == y.TagNo &&
x.CableRevision == y.CableRevision;
}
// If Equals() returns true for a pair of objects
// then GetHashCode() must return the same value for these objects.
public int GetHashCode(Cable x)
{
return x.CablePropertyId ^
x.TagNo.GetHashCode() ^
x.CableRevision.GetHashCode();
}
}
Then use this overload of Except:
var comparer = new CableComparer();
var diffCables = sourceCables.Except(destinationCables, comparer).ToList();
Alternatively, the MoreLINQ library (also available on NuGet) provides a convenient ExceptBy method:
var diffCables = sourceCables.ExceptBy(
destinationCables,
x => new {
x.CablePropertyId,
x.TagNo,
x.CableRevision
})
.ToList();
You can override the Equals and GetHashCode methods of Cable if you will always compare this object in this manner.
Otherwise you can write a custom comparer and use the overload for .Except
List.Except Method
I think you can use something like this:
sourceCables.Where(sc => !destinationCables.Any(dc => dc.CablePropertyId == sc.CablePropertyId && ...));
Essentially, when you want to compare your own types, you'll need to describe how they compare/differ from each other. Linq wouldn't know which properties in your Cable class are different right?
So you build a comparer which can be used generally to compare two types.
In this case, two Cable instances:
class CableComparer : IEqualityComparer<Cable>
{
public bool Equals(Cable c1, Cable c2)//these represent any two cables.
{
if (c1.Height == c2.Height && ...)
{
return true;
}
else
{
return false;
}
}
public int GetHashCode(Cable c)
{
//this will work if each ID is unique
return c.Id.GetHashCode();
//otherwise you do this:
//return (c.Id ^ c. CablePropertyId).GetHashCode();
}
}
Then:
IEnumerable<Cable> except =
sourceCables.Except(destinationCables, new CableComparer());
If you use LINQ with IQueryable<>, there may be solution with Where()
var destinationCablesAnon = destinationCables.Select(a=>new {a.CablePropertyId, a.TagNo ,a.CableRevision}); // add ToArray() if use IEnumerable<>
var diffCables = sourceCables.Where(a=>!destinationCables.Contains(new {a.CablePropertyId, a.TagNo ,a.CableRevision})).ToList();
EDIT:
What I'm trying to do is to find if db.Id is equal to xml.Id and db.SubTitle is equal to xml.SubTitle ....etc.....all my prop
also I did tried
bool result = db.SequenceEqual(xml) it returns false all the time.
ENd EDIT
I did search before I end-up asking for help and I'm not sure what is the best way to approach to my problem.
I have two IList objects and both have exact same property but the data might be different.
one object is populating from db and other is reading from xml to compare both source is in sync.
here is my object looks like:
public class EmployeeObject
{
public Int32 Id { get; set; }
public string SubTitle { get; set; }
public string Desc { get; set; }
public bool Active { get; set; }
public string ActiveDateTime { get; set; }
}
here is what I have tried:
IList<EmployeeObject> db = Db.EmployeeRepository.PopulateFromDb();
IList<EmployeeObject> xml = Xml.EmployeeRepository.PopulateFromXml();
//both object populated with data so far so good....
Time to compare now:
I have tried some thing like this:
if ((object)xml == null || ((object)db) == null)
return Object.Equals(xml, db);
return xml.Equals(db); // returning false all the time
i have checked both object has the exact same data and but still returning false
The Equals method that you are using is going to determine if the two references refer to the same list, not if the contents are the same. You can use SequenceEqual to actually verify that two sequences have the same items in the same order.
Next you'll run into the issue that each item in the list will be compared to see if they refer to the same object, rather than containing the same field values, or the same ID values, as seems to be the what you want here. One option is a custom comparer, but another is to pull out the "identity" object in question:
bool areEqual = db.Select(item => item.id)
.SequenceEqual(xml.Select(item => item.id));
You should override Equals and GetHashCode in your class like this:
public class EmployeeObject {
public Int32 Id { get; set; }
public string SubTitle { get; set; }
public string Desc { get; set; }
public bool Active { get; set; }
public string ActiveDateTime { get; set; }
public override bool Equals(object o){
EmployeeObject e = o as EmployeeObject;
if(e == null) return false;
return Id == e.Id && SubTitle == e.SubTitle && Desc == e.Desc
&& Active == e.Active && ActiveDateTime == e.ActiveDateTime;
}
public override int GetHashCode(){
return Id.GetHashCode() ^ SubTitle.GetHashCode() ^ Desc.GetHashCode()
^ Active.GetHashCode() ^ ActiveDateTime.GetHashCode();
}
}
Then use the SequenceEqual method:
return db.OrderBy(e=>e.Id).SequenceEqual(xml.OrderBy(e=>e.Id));
IList does not have an Equals method. What you're calling is the standard Object equals which checks whether two variables point to the same object or not.
If you want to check that the lists are semantically equivalent, you will need to check that each object in the list is equivalent. If the EmployeeObject class has an appropriate Equals method, then you can use SequenceEquals to compare the lists.
You can implement an IEqualityComparer and use the overload of SequenceEquals that takes an IEqualityComparer. Here is sample code for an IEqualityComparer from msdn:
class BoxEqualityComparer : IEqualityComparer<Box>
{
public bool Equals(Box b1, Box b2)
{
if (b1.Height == b2.Height && b1.Length == b2.Length && b1.Width == b2.Width)
{
return true;
}
else
{
return false;
}
}
public int GetHashCode(Box bx)
{
int hCode = bx.Height ^ bx.Length ^ bx.Width;
return hCode.GetHashCode();
}
}
You can then use SequenceEquals like this:
if (db.SequnceEquals(xml), new MyEqualityComparer())
{ /* Logic here */ }
Note that this will only return true if the items also are ordered in the same order in the lists. If appropriate, you can pre-order the items like this:
if (db.OrderBy(item => item.id).SequnceEquals(xml.OrderBy(item => item.id)), new MyEqualityComparer())
{ /* Logic here */ }
Obviously, the return of return xml.Equals(db); will always be false if you are comparing two different lists.
The only way for this to make sense is for you to actually be more specific about what it means for those two lists to be equal. That is you need to go through the elements in the two lists and ensure that the lists both contain the same items. Even that is ambiguous but assuming that the elements in your provide a proper override for Equals() and GetHashCode() then you can proceed to implement that actual list comparison.
Generally, the most efficient method to compare two lists that don't contain duplicates will be to use a hash set constructed from elements of one of the lists and then iterate through the elements of the second, testing whether each element is found in the hash set.
If the lists contain duplicates your best bet is going to be to sort them both and then walk the lists in tandem ensuring that the elements at each point match up.
You can use SequenceEqual provided you can actually compare instances of EmployeeObject. You probably have to Equals on EmployeeObject:
public override bool Equals(object o)
{
EmployeeObject obj = o as EmployeeObject;
if(obj == null) return false;
// Return true if all the properties match
return (Id == obj.Id &&
SubTitle == obj.SubTitle &&
Desc == obj.Desc &&
Active == obj.Active &&
ActiveDateTime == obj.ActiveDateTime);
}
Then you can do:
var same = db.SequenceEqual(xml);
You can also pass in a class that implements IEqualityComparer which instructs SequenceEqual how to compare each instance:
var same = db.SequenceEqual(xml, someComparer);
Another quick way, though not as fast, would be to build two enumerations of the value you want to compare, probably the id property in your case:
var ids1 = db.Select(i => i.Id); // List of all Ids in db
var ids2 = xml.Select(i => i.Id); // List of all Ids in xml
var same = ids1.SequenceEqual(ids2); // Both lists are the same
Consider the following class hierarchy:
public class Foo
{
public string Name { get; set; }
public int Value { get; set; }
}
public class Bar
{
public string Name { get; set; }
public IEnumerable<Foo> TheFoo { get; set; }
}
public class Host
{
public void Go()
{
IEnumerable<Bar> allBar = //Build up some large list
//Get Dictionary<Bar, Foo> with max foo value
}
}
What I would like to do using Linq2Objects is to get an KeyValuePair where for each Bar in the allBBar collection we select the Foo with the maximum Value property. Can this be done easily in a single LINQ statement?
Sure, although my preferred solution uses MaxBy from MoreLINQ:
var query = allBar.ToDictionary(x => x, // Key
x => x.TheFoo.MaxBy(f => f.Value));
Note that this will go pear-shaped if TheFoo is empty for any Bar instance.
Another way using Aggregate instead of OrderBy so that figuring out the max Foo is O(n) instead of O(n log n):
var query = allBar.ToDictionary(
bar => bar,
bar => bar.TheFoo.Aggregate(
null,
(max, foo) => (max == null || foo.Value > max.Value) ? foo : max));
just to add to Jon's comment about MaxBy going pear shaped if you have no foos, you could do an OrderByDescending and then use FirstOrDefault to get at the Max element. If the collection is empty it'd just return null instead of going "pear shaped".
var foobars = bars.ToDictionary(bar => bar,
bar => bar.TheFoo.OrderByDescending(foo => foo.Value).FirstOrDefault());
I don't think this wouldn't be as efficient as MaxBy, but it'd be more robust in the case of an empty collection.