Check if one IEnumerable contains all elements of another IEnumerable - c#

What is the fastest way to determine if one IEnumerable contains all the elements of another IEnumerable when comparing a field/property of each element in both collections?
public class Item
{
public string Value;
public Item(string value)
{
Value = value;
}
}
//example usage
Item[] List1 = {new Item("1"),new Item("a")};
Item[] List2 = {new Item("a"),new Item("b"),new Item("c"),new Item("1")};
bool Contains(IEnumerable<Item> list1, IEnumerable<Item>, list2)
{
var list1Values = list1.Select(item => item.Value);
var list2Values = list2.Select(item => item.Value);
return //are ALL of list1Values in list2Values?
}
Contains(List1,List2) // should return true
Contains(List2,List1) // should return false

There is no "fast way" to do this unless you track and maintain some state that determines whether all values in one collection are contained in another. If you only have IEnumerable<T> to work against, I would use Intersect.
var allOfList1IsInList2 = list1.Intersect(list2).Count() == list1.Count();
The performance of this should be very reasonable, since Intersect() will enumerate over each list just once. Also, the second call to Count() will be optimal if the underlying type is an ICollection<T> rather than just an IEnumerable<T>.

You could also use Except to remove from the first list all values that exist in the second list, and then check if all values have been removed:
var allOfList1IsInList2 = !list1.Except(list2).Any();
This method had the advantage of not requiring two calls to Count().

C# 3.5+
Using Enumerable.All<TSource> to determine if all List2 items are contained in List1:
bool hasAll = list2Uris.All(itm2 => list1Uris.Contains(itm2));
This will also work when list1 contains even more than all the items of list2.

Kent's answer is fine and short, but the solution that he provides always requires iteration over the whole first collection. Here is the source code:
public static IEnumerable<TSource> Intersect<TSource>(this IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
if (first == null)
throw Error.ArgumentNull("first");
if (second == null)
throw Error.ArgumentNull("second");
return Enumerable.IntersectIterator<TSource>(first, second, comparer);
}
private static IEnumerable<TSource> IntersectIterator<TSource>(IEnumerable<TSource> first, IEnumerable<TSource> second, IEqualityComparer<TSource> comparer)
{
Set<TSource> set = new Set<TSource>(comparer);
foreach (TSource source in second)
set.Add(source);
foreach (TSource source in first)
{
if (set.Remove(source))
yield return source;
}
}
That is not always required. So, here is my solution:
public static bool Contains<T>(this IEnumerable<T> source, IEnumerable<T> subset, IEqualityComparer<T> comparer)
{
var hashSet = new HashSet<T>(subset, comparer);
if (hashSet.Count == 0)
{
return true;
}
foreach (var item in source)
{
hashSet.Remove(item);
if (hashSet.Count == 0)
{
break;
}
}
return hashSet.Count == 0;
}
Actually, you should think about using ISet<T> (HashSet<T>). It contains all required set methods. IsSubsetOf in your case.

The solution marked as the answer would fail in the case of repetitions. If your IEnumerable only contains distinct values then it would pass.
The below answer is for 2 lists with repetitions:
int aCount = a.Distinct().Count();
int bCount = b.Distinct().Count();
return aCount == bCount &&
a.Intersect(b).Count() == aCount;

You should use HashSet instead of Array.
Example:
List1.SetEquals(List2); //returns true if the collections contains exactly same elements no matter the order they appear in the collection
Reference
The only HasSet limitation is that we can't get item by index like List nor get item by Key like Dictionaries. All you can do is enumerate them(for each, while, etc)

the Linq operator SequenceEqual would work also (but is sensitive to the enumerable's items being in the same order)
return list1Uris.SequenceEqual(list2Uris);

Another way is to convert your superset list to a HashSet and use the IsSuperSet method of HashSet.
bool Contains(IEnumerable<Item> list1, IEnumerable<Item>, list2)
{
var list1Values = list1.Select(item => item.Value);
var list2Values = list2.Select(item => item.Value).ToHashSet();
return list2Values.IsSupersetOf(list1Values);
}

Related

What is the best way to trim a list?

I have a List of strings. Its being generated elsewhere but i will generate it below to help describe this simplified example
var list = new List<string>();
list.Add("Joe");
list.Add("");
list.Add("Bill");
list.Add("Bill");
list.Add("");
list.Add("Scott");
list.Add("Joe");
list.Add("");
list.Add("");
list = TrimList(list);
I would like a function that "trims" this list and by trim I want to remove all items at the end of the array that are blank strings (the final two in this case).
NOTE: I still want to keep the blank one that is the second item in the array (or any other one that is just not at the end) so I can't do a .Where(r=> String.isNullOrEmpty(r))
I would just write it without any LINQ, to be honest- after all, you're modifying a collection rather than just querying it:
void TrimList(List<string> list)
{
int lastNonEmpty = list.FindLastIndex(x => !string.IsNullOrEmpty(x));
int firstToRemove = lastNonEmpty + 1;
list.RemoveRange(firstToRemove, list.Count - firstToRemove);
}
If you actually want to create a new list, then the LINQ-based solutions are okay... although potentially somewhat inefficient (as Reverse has to buffer everything).
Take advantage of Reverse and SkipWhile.
list = list.Reverse().SkipWhile(s => String.IsNullOrEmpty(s)).Reverse().ToList();
List<T> (not the interface) has a FindLastIndex method. Therefore you can wrap that in a method:
static IList<string> TrimList(List<string> input) {
return input.Take(input.FindLastIndex(x => !string.IsNullOrEmpty(x)) + 1)
.ToList();
}
This produces a copy, whereas Jon's modifies the list.
The only solution I can think of is to code a loop that starts at the end of the list and searches for an element that is not an empty string. Don't know of any library functions that would help. Once you know the last good element, you know which ones to remove.
Be careful not to modify the collection while you are iterating over it. Tends to break the iterator.
I always like to come up with the most generic solution possible. Why restrict yourself with lists and strings? Let's make an algorithm for generic enumerable!
public static class EnumerableExtensions
{
public static IEnumerable<T> TrimEnd<T>(this IEnumerable<T> enumerable, Predicate<T> predicate)
{
if (predicate == null)
{
throw new ArgumentNullException("predicate");
}
var accumulator = new LinkedList<T>();
foreach (var item in enumerable)
{
if (predicate(item))
{
accumulator.AddLast(item);
}
else
{
foreach (var accumulated in accumulator)
{
yield return accumulated;
}
accumulator.Clear();
yield return item;
}
}
}
}
Use it like this:
var list = new[]
{
"Joe",
"",
"Bill",
"Bill",
"",
"Scott",
"Joe",
"",
""
};
foreach (var item in list.TrimEnd(string.IsNullOrEmpty))
{
Console.WriteLine(item);
}

How can I efficiently determine if an IEnumerable has more than one element?

Given an initialised IEnumerable:
IEnumerable<T> enumerable;
I would like to determine if it has more than one element. I think the most obvious way to do this is:
enumerable.Count() > 1
However, I believe Count() enumerates the whole collection, which is unnecessary for this use case. For example, if the collection contains a very large amount of elements or provided its data from an external source, this could be quite wasteful in terms of performance.
How can I do this without enumerating any more than 2 elements?
You can test this in many ways by combining the extension methods in System.Linq... Two simple examples are below:
bool twoOrMore = enumerable.Skip(1).Any();
bool twoOrMoreOther = enumerable.Take(2).Count() == 2;
I prefer the first one since a common way to check whether Count() >= 1 is with Any() and therefore I find it more readable.
For the fun of it, call Next() twice, then get another IEnumerable.
Or, write a small wrapper class for this specific goal: EnumerablePrefetcher : IEnumerable<T> to try and fetch the specified amount of items upon initialization.
Its IEnumerable<T> GetItems() method should use yield return in this fashion
foreach (T item in prefetchedItems) // array of T, prefetched and decided if IEnumerable has at least n elements
{
yield return item;
}
foreach (T item in otherItems) // IEnumerable<T>
{
yield return item;
}
#Cameron-S's solution is simpler but below is more efficient. I came up with this based on Enumerable.Count() method. Skip() will always iterate and not short-circuit to get source's count for ICollection or ICollection<T> type.
/// <summary>
/// Returns true if source has at least <paramref name="count"/> elements efficiently.
/// </summary>
/// <remarks>Based on int Enumerable.Count() method.</remarks>
public static bool HasCountOfAtLeast<TSource>(this IEnumerable<TSource> source, int count)
{
source.ThrowIfArgumentNull("source");
var collection = source as ICollection<TSource>;
if (collection != null)
{
return collection.Count >= count;
}
var collection2 = source as ICollection;
if (collection2 != null)
{
return collection2.Count >= count;
}
int num = 0;
checked
{
using (var enumerator = source.GetEnumerator())
{
while (enumerator.MoveNext())
{
num++;
if (num >= count)
{
return true;
}
}
}
}
// returns true for source with 0 elements and count 0
return num == count;
}
I had a similar need, but to get the single value from an IEnumerable if it only has a single value. I made an extension method for it:
public static S OneOnlyOrDefault<S>(this IEnumerable<S> items)
{
var rtn = default(S);
using (var en = items.GetEnumerator())
{
if (en.MoveNext())
{
rtn = en.Current;
}
if (en.MoveNext())
{
rtn = default(S);
}
}
return rtn;
}
To answer the question does this collection contain only 1 item? You could do (where the collection contains reference types in this case):
if (myList.OneOnlyOrDefault() == null)
{
// list is either empty or contains more than one item
}

Check if IEnumerable has ANY rows without enumerating over the entire list

I have the following method which returns an IEnumerable of type T. The implementation of the method is not important, apart from the yield return to lazy load the IEnumerable. This is necessary as the result could have millions of items.
public IEnumerable<T> Parse()
{
foreach(...)
{
yield return parsedObject;
}
}
Problem:
I have the following property which can be used to determine if the IEnumerable will have any items:
public bool HasItems
{
get
{
return Parse().Take(1).SingleOrDefault() != null;
}
}
Is there perhaps a better way to do this?
IEnumerable.Any() will return true if there are any elements in the sequence and false if there are no elements in the sequence. This method will not iterate the entire sequence (only maximum one element) since it will return true if it makes it past the first element and false if it does not.
Similar to Howto: Count the items from a IEnumerable<T> without iterating? an Enumerable is meant to be a lazy, read-forward "list", and like quantum mechanics the act of investigating it alters its state.
See confirmation: https://dotnetfiddle.net/GPMVXH
var sideeffect = 0;
var enumerable = Enumerable.Range(1, 10).Select(i => {
// show how many times it happens
sideeffect++;
return i;
});
// will 'enumerate' one item!
if(enumerable.Any()) Console.WriteLine("There are items in the list; sideeffect={0}", sideeffect);
enumerable.Any() is the cleanest way to check if there are any items in the list. You could try casting to something not lazy, like if(null != (list = enumerable as ICollection<T>) && list.Any()) return true.
Or, your scenario may permit using an Enumerator and making a preliminary check before enumerating:
var e = enumerable.GetEnumerator();
// check first
if(!e.MoveNext()) return;
// do some stuff, then enumerate the list
do {
actOn(e.Current); // do stuff with the current item
} while(e.MoveNext()); // stop when we don't have anything else
The best way to answer this question, and to clear all doubts, is to see what the 'Any' function does.
public static bool Any<TSource>(this IEnumerable<TSource> source) {
if (source == null) throw Error.ArgumentNull("source");
using (IEnumerator<TSource> e = source.GetEnumerator()) {
if (e.MoveNext()) return true;
}
return false;
}
https://github.com/microsoft/referencesource/blob/master/System.Core/System/Linq/Enumerable.cs

Why cannot use iterator block with IOrderedEnumerable

I wrote this:
using System;using System.Linq;
static class MyExtensions
{
public static IEnumerable<T> Inspect<T> (this IEnumerable<T> source)
{
Console.WriteLine ("In Inspect");
//return source; //Works, but does nothing
foreach(T item in source){
Console.WriteLine(item);
yield return item;
}
}
}
Then went to test it with this:
var collection = Enumerable.Range(-5, 11)
.Select(x => new { Original = x, Square = x * x })
.Inspect()
.OrderBy(x => x.Square)
//.Inspect()
.ThenBy(x => x.Original)
;
foreach (var element in collection)
{
Console.WriteLine(element);
}
The first use of Inspect() works fine. The second one, commented out, won't compile. The return of OrderBy is IOrderedEnumerable. I'd have thought IOrderedEnumerable is-a IEnumerable but, rolling with the punches, I tried:
public static IOrderedEnumerable<T> Inspect<T> (this IOrderedEnumerable<T> source)
{
Console.WriteLine ("In Inspect (ordered)");
foreach(T item in source){
Console.WriteLine(item);
yield return item;
}
}
But this won't compile either. I get told I cannot have an iterator block because System.Linq.IOrderedEnumberable is not an iterator interface type.
What am I missing? I cannot see why people wouldn't want to iterate over an ordered collection the same way they do with the raw collection.
(Using Mono 2.10.8.1, which is effectively C# 4.0, and MonoDevelop 2.8.6.3)
UPDATE:
As joshgo kindly pointed out, I can take an input parameter of IOrderedEnumerable, it does indeed act as-a IEnumerable. But to iterate I must return IEnumerable, and my original error was caused by ThenBy, which insists on being given IOrderedEnumerable. Very reasonable too. But is there a way to satisfy ThenBy here?
UPDATE2:
After playing with the code in both answers (both of which were very helpful), I finally understood why I can't use yield with an IOrderedEnumerable return: there is no point, because the values have to be fully available in order to do the sort. So instead of a loop with yield in it, I may as well use a loop to print out all items, then just return source once at the end.
I believe an explanation of the error can be found here: Some help understanding "yield"
Quoting Lasse V. Karlsen:
A method using yield return must be declared as returning one of the
following two interfaces: IEnumerable or IEnumerator
The issues seems to be with the yield operator and the return type of your second function, IOrderedEnumerable.
If you change the return type from IOrderedEnumerable to IEnumerable, then the 2nd Inspect() call will no longer be an error. However, the ThenBy() call will now throw an error. If you temporarily comment it out, it'll compile but you do lose access to the ThenBy() method.
var collection = Enumerable.Range(-5, 11)
.Select(x => new { Original = x, Square = x * x })
.Inspect()
.OrderBy(x => x.Square)
.Inspect()
//.ThenBy(x => x.Original)
;
foreach (var element in collection)
{
Console.WriteLine(element);
}
...
public static IEnumerable<T> Inspect<T> (this IOrderedEnumerable<T> source)
{
Console.WriteLine ("In Inspect (ordered)");
foreach(T item in source){
Console.WriteLine(item);
yield return item;
}
}
If you want to apply your extension method after operation, which returns IOrdereEnumerable and continue ordering, then you need to create second overloaded extension:
public static IOrderedEnumerable<T> Inspect<T>(this IOrderedEnumerable<T> source)
{
Console.WriteLine("In Ordered Inspect");
// inspected items will be unordered
Func<T, int> selector = item => {
Console.WriteLine(item);
return 0; };
return source.CreateOrderedEnumerable(selector, null, false);
}
What is interesting here:
You need to return IOrderedEnumerable in order to apply ThenBy or ThenByDescending
IOrderedEnumerable is not created via yield return. In your case it could be achieved by creating it from source
You should create dummy selector, which does not break ordering of items
Output will not contain ordered items, because selector is executed in same order as input sequence.
If you want to see ordered items, you need to execute your OrderedEnumerable. This will force executing of all operators, which present before Inspect:
public static IOrderedEnumerable<T> Inspect<T>(this IOrderedEnumerable<T> source)
{
Console.WriteLine("In Ordered Inspect");
var enumerable = source.CreateOrderedEnumerable(x => 0, null, false);
// each time you apply Inspect all query until this operator will be executed
foreach(var item in enumerable)
Console.WriteLine(item);
return enumerable;
}

Checking if a list is empty with LINQ

What's the "best" (taking both speed and readability into account) way to determine if a list is empty? Even if the list is of type IEnumerable<T> and doesn't have a Count property.
Right now I'm tossing up between this:
if (myList.Count() == 0) { ... }
and this:
if (!myList.Any()) { ... }
My guess is that the second option is faster, since it'll come back with a result as soon as it sees the first item, whereas the second option (for an IEnumerable) will need to visit every item to return the count.
That being said, does the second option look as readable to you? Which would you prefer? Or can you think of a better way to test for an empty list?
Edit #lassevk's response seems to be the most logical, coupled with a bit of runtime checking to use a cached count if possible, like this:
public static bool IsEmpty<T>(this IEnumerable<T> list)
{
if (list is ICollection<T>) return ((ICollection<T>)list).Count == 0;
return !list.Any();
}
You could do this:
public static Boolean IsEmpty<T>(this IEnumerable<T> source)
{
if (source == null)
return true; // or throw an exception
return !source.Any();
}
Edit: Note that simply using the .Count method will be fast if the underlying source actually has a fast Count property. A valid optimization above would be to detect a few base types and simply use the .Count property of those, instead of the .Any() approach, but then fall back to .Any() if no guarantee can be made.
I would make one small addition to the code you seem to have settled on: check also for ICollection, as this is implemented even by some non-obsolete generic classes as well (i.e., Queue<T> and Stack<T>). I would also use as instead of is as it's more idiomatic and has been shown to be faster.
public static bool IsEmpty<T>(this IEnumerable<T> list)
{
if (list == null)
{
throw new ArgumentNullException("list");
}
var genericCollection = list as ICollection<T>;
if (genericCollection != null)
{
return genericCollection.Count == 0;
}
var nonGenericCollection = list as ICollection;
if (nonGenericCollection != null)
{
return nonGenericCollection.Count == 0;
}
return !list.Any();
}
LINQ itself must be doing some serious optimization around the Count() method somehow.
Does this surprise you? I imagine that for IList implementations, Count simply reads the number of elements directly while Any has to query the IEnumerable.GetEnumerator method, create an instance and call MoveNext at least once.
/EDIT #Matt:
I can only assume that the Count() extension method for IEnumerable is doing something like this:
Yes, of course it does. This is what I meant. Actually, it uses ICollection instead of IList but the result is the same.
I just wrote up a quick test, try this:
IEnumerable<Object> myList = new List<Object>();
Stopwatch watch = new Stopwatch();
int x;
watch.Start();
for (var i = 0; i <= 1000000; i++)
{
if (myList.Count() == 0) x = i;
}
watch.Stop();
Stopwatch watch2 = new Stopwatch();
watch2.Start();
for (var i = 0; i <= 1000000; i++)
{
if (!myList.Any()) x = i;
}
watch2.Stop();
Console.WriteLine("myList.Count() = " + watch.ElapsedMilliseconds.ToString());
Console.WriteLine("myList.Any() = " + watch2.ElapsedMilliseconds.ToString());
Console.ReadLine();
The second is almost three times slower :)
Trying the stopwatch test again with a Stack or array or other scenarios it really depends on the type of list it seems - because they prove Count to be slower.
So I guess it depends on the type of list you're using!
(Just to point out, I put 2000+ objects in the List and count was still faster, opposite with other types)
List.Count is O(1) according to Microsoft's documentation:
http://msdn.microsoft.com/en-us/library/27b47ht3.aspx
so just use List.Count == 0 it's much faster than a query
This is because it has a data member called Count which is updated any time something is added or removed from the list, so when you call List.Count it doesn't have to iterate through every element to get it, it just returns the data member.
The second option is much quicker if you have multiple items.
Any() returns as soon as 1 item is found.
Count() has to keep going through the entire list.
For instance suppose the enumeration had 1000 items.
Any() would check the first one, then return true.
Count() would return 1000 after traversing the entire enumeration.
This is potentially worse if you use one of the predicate overrides - Count() still has to check every single item, even it there is only one match.
You get used to using the Any one - it does make sense and is readable.
One caveat - if you have a List, rather than just an IEnumerable then use that list's Count property.
#Konrad what surprises me is that in my tests, I'm passing the list into a method that accepts IEnumerable<T>, so the runtime can't optimize it by calling the Count() extension method for IList<T>.
I can only assume that the Count() extension method for IEnumerable is doing something like this:
public static int Count<T>(this IEnumerable<T> list)
{
if (list is IList<T>) return ((IList<T>)list).Count;
int i = 0;
foreach (var t in list) i++;
return i;
}
... in other words, a bit of runtime optimization for the special case of IList<T>.
/EDIT #Konrad +1 mate - you're right about it more likely being on ICollection<T>.
Ok, so what about this one?
public static bool IsEmpty<T>(this IEnumerable<T> enumerable)
{
return !enumerable.GetEnumerator().MoveNext();
}
EDIT: I've just realized that someone has sketched this solution already. It was mentioned that the Any() method will do this, but why not do it yourself? Regards
Another idea:
if(enumerable.FirstOrDefault() != null)
However I like the Any() approach more.
This was critical to get this to work with Entity Framework:
var genericCollection = list as ICollection<T>;
if (genericCollection != null)
{
//your code
}
If I check with Count() Linq executes a "SELECT COUNT(*).." in the database, but I need to check if the results contains data, I resolved to introducing FirstOrDefault() instead of Count();
Before
var cfop = from tabelaCFOPs in ERPDAOManager.GetTable<TabelaCFOPs>()
if (cfop.Count() > 0)
{
var itemCfop = cfop.First();
//....
}
After
var cfop = from tabelaCFOPs in ERPDAOManager.GetTable<TabelaCFOPs>()
var itemCfop = cfop.FirstOrDefault();
if (itemCfop != null)
{
//....
}
private bool NullTest<T>(T[] list, string attribute)
{
bool status = false;
if (list != null)
{
int flag = 0;
var property = GetProperty(list.FirstOrDefault(), attribute);
foreach (T obj in list)
{
if (property.GetValue(obj, null) == null)
flag++;
}
status = flag == 0 ? true : false;
}
return status;
}
public PropertyInfo GetProperty<T>(T obj, string str)
{
Expression<Func<T, string, PropertyInfo>> GetProperty = (TypeObj, Column) => TypeObj.GetType().GetProperty(TypeObj
.GetType().GetProperties().ToList()
.Find(property => property.Name
.ToLower() == Column
.ToLower()).Name.ToString());
return GetProperty.Compile()(obj, str);
}
Here's my implementation of Dan Tao's answer, allowing for a predicate:
public static bool IsEmpty<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
if (source == null) throw new ArgumentNullException();
if (IsCollectionAndEmpty(source)) return true;
return !source.Any(predicate);
}
public static bool IsEmpty<TSource>(this IEnumerable<TSource> source)
{
if (source == null) throw new ArgumentNullException();
if (IsCollectionAndEmpty(source)) return true;
return !source.Any();
}
private static bool IsCollectionAndEmpty<TSource>(IEnumerable<TSource> source)
{
var genericCollection = source as ICollection<TSource>;
if (genericCollection != null) return genericCollection.Count == 0;
var nonGenericCollection = source as ICollection;
if (nonGenericCollection != null) return nonGenericCollection.Count == 0;
return false;
}
List<T> li = new List<T>();
(li.First().DefaultValue.HasValue) ? string.Format("{0:yyyy/MM/dd}", sender.First().DefaultValue.Value) : string.Empty;
myList.ToList().Count == 0. That's all
This extension method works for me:
public static bool IsEmpty<T>(this IEnumerable<T> enumerable)
{
try
{
enumerable.First();
return false;
}
catch (InvalidOperationException)
{
return true;
}
}

Categories

Resources