How to find the difference between 2 IEnumerable objects - c#

I have 2 list of guids as:
IEnumerable<dynamic> userids = null;
IEnumerable<dynamic> lsCheckedUsers = null;
The userids and lsCheckedUsers list are populated from a SQL database using dapper.
I now wish to find all userids that are not in lsCheckedUsers.
I have tried the following
var userdifference = userids.Where(i => !lsCheckedUsers.Contains(lsCheckedUsers));
var userdifference = userids.Except(lsCheckedUsers);
None of the above actual returns the difference between the 2.
How do I get the difference of guids that do not exist in both.
I am certain that lsCheckedUsers has Guids that are in userids

This is correct:
var userdifference = userids.Except(lsCheckedUsers);
It will work if both of your IEnumerable<dynamic> actually contain Guids. Print out or inspect the items in each to make sure they are Guids.
You should really be using IEnumerable<Guid> and cast the incoming items to Guids if this is what you are expecting. It will hopefully prevent errors like the one you are potentially seeing.

Something along those lines..
var difference = list1.Where (e => !list2.Any(a => a == e))

You have:
var userdifference = userids.Where(i => !lsCheckedUsers.Contains(lsCheckedUsers));
But I think you mean:
var userdifference = userids.Where(i => !lsCheckedUsers.Contains(i));
Update:
To everyone marking down these answers because of "reference" comparisons, consider that Guid is a value type so its equality is evaluated differently. Try this simple test to convince yourself:
var guid = Guid.NewGuid();
var guids = new[] { new Guid(guid.ToString()) };
Console.WriteLine(guids.Contains(guid));
You'll see that the result is True.

Enumerable has an Except method
Enumerable.Except Method (IEnumerable, IEnumerable)
And use String or GUID.
It will compare values for equals.
HashSet ExceptWith would probably have better performance.
But cannot use HashSet if you need to allow duplicates.
HashSet.ExceptWith Method

Related

Retrieving non-duplicates from 2 Collections using LINQ

Background: I have two Collections of different types of objects with different name properties (both strings). Objects in Collection1 have a field called Name, objects in Collection2 have a field called Field.
I needed to compare these 2 properties, and get items from Collection1 where there is not a match in Collection2 based on that string property (Collection1 will always have a greater or equal number of items. All items should have a matching item by Name/Field in Collection2 when finished).
The question: I've found answers using Lists and they have helped me a little(for what it's worth, I'm using Collections). I did find this answer which appears to be working for me, however I would like to convert what I've done from query syntax (if that's what it's called?) to a LINQ query. See below:
//Query for results. This code is what I'm specifically trying to convert.
var result = (from item in Collection1
where !Collection2.Any(x => x.ColumnName == item.FieldName)
select item).ToList();
//** Remove items in result from Collection1**
//...
I'm really not at all familiar with either syntax (working on it), but I think I generally understand what this is doing. I'm struggling trying to convert this to LINQ syntax though and I'd like to learn both of these options rather than some sort of nested loop.
End goal after I remove the query results from Collection1: Collection1.Count == Collection2 and the following is true for each item in the collection: ItemFromCollection1.Name == SomeItemFromCollection2.Field (if that makes sense...)
You can convert this to LINQ methods like this:
var result = Collection1.Where(item => !Collection2.Any(x => x.ColumnName == item.FieldName))
.ToList();
Your first query is the opposite of what you asked for. It's finding records that don't have an equivalent. The following will return all records in Collection1 where there is an equivalent:
var results=Collection1.Where(c1=>!Collection2.Any(c2=>c2.Field==c1.Name));
Please note that this isn't the fastest approach, especially if there is a large number of records in collection2. You can find ways of speeding it up through HashSets or Lookups.
if you want to get a list of non duplicate values to be retained then do the following.
List<string> listNonDup = new List<String>{"6","1","2","4","6","5","1"};
var singles = listNonDup.GroupBy(n => n)
.Where(g => g.Count() == 1)
.Select(g => g.Key).ToList();
Yields: 2, 4, 5
if you want a list of all the duplicate values then you can do the opposite
var duplicatesxx = listNonDup.GroupBy(s => s)
.SelectMany(g => g.Skip(1)).ToList();

Return an IQueryable<dynamic> to filter in the parent method

For example, I have this code:
IQueryable<MyModel> q = new List<MyModel>().AsQueryable(); // this is just an example, this is obviously not a list
var query = from item in q select new { item.Property };
var oneItem = query.FirstOrDefault(x => x.SomeProperty == somevalue);
var allItems = query.ToArray();
Now in a bit more complex situation, I need to get oneItem and allItems in two different methods. So to follow DRY, i'd like to move my query to a private method and then in the consuming ones just call this.GetQuery().FirstOrDefault() or .ToArray() as required.
However, when I try to have the method as IQueryable<dynamic> I get the 'An expression tree may not contain a dynamic operation' error. If I change it to IQueryable<object> then my filtering in the oneItem doesn't work.
You need to return
IQueryable<MyObject>
You can make your methods/classes dry by using genrics eg
IQuerable<T> GetQueryable()
Then the consumer can specify what T should be and your away.
You can't use dynamic with linq. See here to understand why.
For two methods to communicate they must understand the same type so you really want to project into a named type.
If you insist on using dynamic programming it can be done but you will need a lot of casting because dynamic is not a type but just a way of treating object:
IQueryable<MyModel> q = new List<MyModel>().AsQueryable(); // this is just an example, this is obviously not a list
IQueryable<object> query = from item in q select (object)new { item.Property };
var oneItem = query.FirstOrDefault(x => ((dynamic)x).SomeProperty == somevalue);
object[] allItems = query.ToArray();

How do I use Linq with a HashSet of Integers to pull multiple items from a list of Objects?

I have a HashSet of ID numbers, stored as integers:
HashSet<int> IDList; // Assume that this is created with a new statement in the constructor.
I have a SortedList of objects, indexed by the integers found in the HashSet:
SortedList<int,myClass> masterListOfMyClass;
I want to use the HashSet to create a List as a subset of the masterListOfMyclass.
After wasting all day trying to figure out the Linq query, I eventually gave up and wrote the following, which works:
public List<myclass> SubSet {
get {
List<myClass> xList = new List<myClass>();
foreach (int x in IDList) {
if (masterListOfMyClass.ContainsKey(x)) {
xList.Add(masterListOfMyClass[x]);
}
}
return xList;
}
private set { }
}
So, I have two questions here:
What is the appropriate Linq query? I'm finding Linq extremely frustrating to try to figuere out. Just when I think I've got it, it turns around and "goes on strike".
Is a Linq query any better -- or worse -- than what I have written here?
var xList = IDList
.Where(masterListOfMyClass.ContainsKey)
.Select(x => masterListOfMyClass[x])
.ToList();
If your lists both have equally large numbers of items, you may wish to consider inverting the query (i.e. iterate through masterListOfMyClass and query IDList) since a HashSet is faster for random queries.
Edit:
It's less neat, but you could save a lookup into masterListOfMyClass with the following query, which would be a bit faster:
var xList = IDList
.Select(x => { myClass y; masterListOfMyClass.TryGetValue(x, out y); return y; })
.Where(x => x != null)
.ToList();
foreach (int x in IDList.Where(x => masterListOfMyClass.ContainsKey(x)))
{
xList.Add(masterListOfMyClass[x]);
}
This is the appropriate linq query for your loop.
Here the linq query will not effective in my point of view..
Here is the Linq expression:
List<myClass> xList = masterListOfMyClass
.Where(x => IDList.Contains(x.Key))
.Select(x => x.Value).ToList();
There is no big difference in the performance in such a small example, Linq is slower in general, it actually uses iterations under the hood too. The thing you get with ling is, imho, clearer code and the execution is defered until it is needed. Not i my example though, when I call .ToList().
Another option would be (which is intentionally the same as Sankarann's first answer)
return (
from x in IDList
where masterListOfMyClass.ContainsKey(x)
select masterListOfMyClass[x]
).ToList();
However, are you sure you want a List to be returned? Usually, when working with IEnumerable<> you should chain your calls using IEnumerable<> until the point where you actually need the data. There you can decide to e.g. loop once (use the iterator) or actually pull the data in some sort of cache using the ToList(), ToArray() etc. methods.
Also, exposing a List<> to the public implies that modifying this list has an impact on the calling class. I would leave it to the user of the property to decide to make a local copy or continue using the IEnumerable<>.
Second, as your private setter is empty, setting the 'SubSet' has no impact on the functionality. This again is confusing and I would avoid it.
An alternate (an maybe less confusing) declaration of your property might look like this
public IEnumerable<myclass> SubSet {
get {
return from x in IDList
where masterListOfMyClass.ContainsKey(x)
select masterListOfMyClass[x]
}
}

Intersect two collections which contain different types

Suppose I have one collection, call it ids it is of type IEnumerable<string>, I have a second collection call it objects it's of type MyObject[]. MyObject has a string property called id. I would like a LINQ statement that returns all off the objects in the objects collection who's id matches any value in the ids collection. ids will be a strict subset of objects.Select(x => x.id). Meaning, for every string in ids I know there will be exactly one corresponding MyObject in objects. Can someone post a pure LINQ solution? I've tried a couple things with no luck. I can come up with an iterative solution easily enough so unless it's impossible to do with only LINQ please don't post any.
"Just" LINQ:
var r = obj.Where(o => ids.Any(id => id == o.id));
But better, for larger n, with a set:
var hs = new HashSet(ids);
var r = obj.Where(o => hs.Contains(o.id));
I think this is pretty straightforward with query syntax.
It would look something like:
var a = from o in objects
join i in ids on o.id equals i
select o;
If you just want a list of MyObject that match, you can do :
var solution = objects.Where(x=> ids.Contains(x.id));
With this instead, you'll get a List<T> where T is an Anonymous type with 2 properties, Id that is the string that work as "key" in this specific case, and Obj, a list of MyObject which id correspond to the Id property.
var solution = ids.Select(x=>new{ Id = x, Obj=objects.Where(y=>y.id == x).ToList()})
.ToList();
If you just want to know if there is any object in the intersection (which was what I was looking for)
Based on this
var a = from o in objects
join i in ids on o.id equals i
select o;
You can do this as well
var isEmpty = objects.Any(x => ids.Any(y => y == x.ToString()));
The accepted answer is correct. However, if someone doesn't like using SQL style LINQ, here is the LINQ extension method approach to solving the same problem.
var filteredObjects = objects.Join(ids, obj => obj.Id, id => id, (obj, _) => obj);
We are joining two different types, so the 2nd & 3rd Join parameter signify that join will be made on id.
The fourth parameter is used to select an object out of the resultant (obj, id) pair after applying join.

LINQ flavored IS IN Query

There are quite a few other questions similiar to this but none of them seem to do what I'm trying to do. I'd like pass in a list of string and query
SELECT ownerid where sysid in ('', '', '') -- i.e. List<string>
or like
var chiLst = new List<string>();
var parRec = Lnq.attlnks.Where(a => a.sysid IN chiList).Select(a => a.ownerid);
I've been playing around with a.sysid.Contains() but haven't been able to get anywhere.
Contains is the way forward:
var chiLst = new List<string>();
var parRec = Lnq.attlnks.Where(a => chiList.Contains(a.sysid))
.Select(a => a.ownerid);
Although you'd be better off with a HashSet<string> instead of a list, in terms of performance, given all the contains checks. (That's assuming there will be quite a few entries... for a small number of values, it won't make much difference either way, and a List<string> may even be faster.)
Note that the performance aspect is assuming you're using LINQ to Objects for this - if you're using something like LINQ to SQL, it won't matter as the Contains check won't be done in-process anyway.
You wouldn't call a.sysid.Contains; the syntax for IN (SQL) is the reverse of the syntax for Contains (LINQ)
var parRec = Lnq.attlnks.Where(a => chiList.Contains(a.sysid))
.Select(a => a.ownerid);
In addition to the Contains approach, you could join:
var parRec = from a in Lnq.attlnks
join sysid in chiLst
on a.sysid equals sysid
select a.ownerid
I'm not sure whether this will do better than Contains with a HashSet, but it will at least have similar performance. It will certainly do better than using Contains with a list.

Categories

Resources