Lazily partition sequence with LINQ

Lazily partition sequence with LINQ - c#

I have the following extension method to find an element within a sequence, and then return two IEnumerable<T>s: one containing all the elements before that element, and one containing the element and everything that follows. I would prefer if the method were lazy, but I haven't figured out a way to do that. Can anyone come up with a solution?
public static PartitionTuple<T> Partition<T>(this IEnumerable<T> sequence, Func<T, bool> partition)
{
var a = sequence.ToArray();
return new PartitionTuple<T>
{
Before = a.TakeWhile(v => !partition(v)),
After = a.SkipWhile(v => !partition(v))
};
}
Doing sequence.ToArray() immediately defeats the laziness requirement. However, without that line, an expensive-to-iterate sequence may be iterated over twice. And, depending on what the calling code does, many more times.

You can use the Lazy object to ensure that the source sequence isn't converted to an array until one of the two partitions is iterated:
public static PartitionTuple<T> Partition<T>(
this IEnumerable<T> sequence, Func<T, bool> partition)
{
var lazy = new Lazy<IEnumerable<T>>(() => sequence.ToArray());
return new PartitionTuple<T>
{
Before = lazy.MapLazySequence(s => s.TakeWhile(v => !partition(v))),
After = lazy.MapLazySequence(s => s.SkipWhile(v => !partition(v)))
};
}
We'll use this method to defer evaluating the lazy until the sequence itself is iterated:
public static IEnumerable<TResult> MapLazySequence<TSource, TResult>(
this Lazy<IEnumerable<TSource>> lazy,
Func<IEnumerable<TSource>, IEnumerable<TResult>> filter)
{
foreach (var item in filter(lazy.Value))
yield return item;
}

This is an interesting problem and to get it right, you have to know what "right" is. For the semantics of the operation, I think that this definition makes sense:
The source sequence is only enumerated once even though the resulting sequences are enumerated several times.
The source sequence isn't enumerated until one of the results is enumerated.
Each of the results should be possible to enumerate independently.
If the source sequence changes, it is undefined what will happen.
I'm not entirely sure I got the handling of the matching object right, but I hope you get the idea. I'm deferring a lot of the work to the PartitionTuple<T> class to be able to be lazy.
public class PartitionTuple<T>
{
IEnumerable<T> source;
IList<T> before, after;
Func<T, bool> partition;
public PartitionTuple(IEnumerable<T> source, Func<T, bool> partition)
{
this.source = source;
this.partition = partition;
}
private void EnsureMaterialized()
{
if(before == null)
{
before = new List<T>();
after = new List<T>();
using(var enumerator = source.GetEnumerator())
{
while(enumerator.MoveNext() && !partition(enumerator.Current))
{
before.Add(enumerator.Current);
}
while(!partition(enumerator.Current) && enumerator.MoveNext());
while(enumerator.MoveNext())
{
after.Add(enumerator.Current);
}
}
}
}
public IEnumerable<T> Before
{
get
{
EnsureMaterialized();
return before;
}
}
public IEnumerable<T> After
{
get
{
EnsureMaterialized();
return after;
}
}
}
public static class Extensions
{
public static PartitionTuple<T> Partition<T>(this IEnumerable<T> sequence, Func<T, bool> partition)
{
return new PartitionTuple<T>(sequence, partition);
}
}

Here's a generic solution that will memoize any IEnumerable<T> to ensure it's only iterated once, without forcing the whole thing to iterate:
public class MemoizedEnumerable<T> : IEnumerable<T>, IDisposable
{
private readonly IEnumerator<T> _childEnumerator;
private readonly List<T> _itemCache = new List<T>();
public MemoizedEnumerable(IEnumerable<T> enumerableToMemoize)
{
_childEnumerator = enumerableToMemoize.GetEnumerator();
}
public IEnumerator<T> GetEnumerator()
{
return _itemCache.Concat(EnumerateOnce()).GetEnumerator();
}
public void Dispose()
{
_childEnumerator.Dispose();
}
private IEnumerable<T> EnumerateOnce()
{
while (_childEnumerator.MoveNext())
{
_itemCache.Add(_childEnumerator.Current);
yield return _childEnumerator.Current;
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
public static class EnumerableExtensions
{
public static IEnumerable<T> Memoize<T>(this IEnumerable<T> enumerable)
{
return new MemoizedEnumerable<T>(enumerable);
}
}
To use it for your partitioning problem, do this:
var memoized = sequence.Memoize();
return new PartitionTuple<T>
{
Before = memoized.TakeWhile(v => !partition(v)),
After = memoized.SkipWhile(v => !partition(v))
};
This will only iterate sequence a maximum of one time.

Generally, you just return some object of your custom class, which implements IEnumerable<T> but also provides the results on enumeration demand only.
You can also implement IQueryable<T> (inherits IEnumerable) instead of IEnumerable<T>, but it's rather needed for building reach functionality with queries like the one, which linq for sql provides: database query being executed only on the final enumeration request.

Related

Calling method with IEnumerable<T> sequence as argument, if that sequence is not empty

I have method Foo, which do some CPU intensive computations and returns IEnumerable<T> sequence. I need to check, if that sequence is empty. And if not, call method Bar with that sequence as argument.
I thought about three approaches...
Check, if sequence is empty with Any(). This is ok, if sequence is really empty, which will be case most of the times. But it will have horrible performance, if sequence will contains some elements and Foo will need them compute again...
Convert sequence to list, check if that list it empty... and pass it to Bar. This have also limitation. Bar will need only first x items, so Foo will be doing unnecessary work...
Check, if sequence is empty without actually reset the sequence. This sounds like win-win, but I can't find any easy build-in way, how to do it. So I create this obscure workaround and wondering, whether this is really a best approach.
Condition
var source = Foo();
if (!IsEmpty(ref source))
Bar(source);
with IsEmpty implemented as
bool IsEmpty<T>(ref IEnumerable<T> source)
{
var enumerator = source.GetEnumerator();
if (enumerator.MoveNext())
{
source = CreateIEnumerable(enumerator);
return false;
}
return true;
IEnumerable<T> CreateIEnumerable(IEnumerator<T> usedEnumerator)
{
yield return usedEnumerator.Current;
while (usedEnumerator.MoveNext())
{
yield return usedEnumerator.Current;
}
}
}
Also note, that calling Bar with empty sequence is not option...
EDIT:
After some consideration, best answer for my case is from Olivier Jacot-Descombes - avoid that scenario completely. Accepted solution answers this question - if it is really no other way.

I don't know whether your algorithm in Foo allows to determine if the enumeration will be empty without doing the calculations. But if this is the case, return null if the sequence would be empty:
public IEnumerable<T> Foo()
{
if (<check if sequence will be empty>) {
return null;
}
return GetSequence();
}
private IEnumerable<T> GetSequence()
{
...
yield return item;
...
}
Note that if a method uses yield return, it cannot use a simple return to return null. Therefore a second method is needed.
var sequence = Foo();
if (sequence != null) {
Bar(sequence);
}
After reading one of your comments
Foo need to initialize some resources, parse XML file and fill some HashSets, which will be used to filter (yield) returned data.
I suggest another approach. The time consuming part seems to be the initialization. To be able to separate it from the iteration, create a foo calculator class. Something like:
public class FooCalculator<T>
{
private bool _isInitialized;
private string _file;
public FooCalculator(string file)
{
_file = file;
}
private EnsureInitialized()
{
if (_isInitialized) return;
// Parse XML.
// Fill some HashSets.
_isInitialized = true;
}
public IEnumerable<T> Result
{
get {
EnsureInitialized();
...
yield return ...;
...
}
}
}
This ensures that the costly initialization stuff is executed only once. Now you can safely use Any().
Other optimizations are conceivable. The Result property could remember the position of the first returned element, so that if it is called again, it could skip to it immediately.

You would like to call some function Bar<T>(IEnumerable<T> source) if and only if the enumerable source contains at least one element, but you're running into two problems:
There is no method T Peek() in IEnumerable<T> so you would need to actually begin to evaluate the enumerable to see if it's nonempty, but...
You don't want to even partially double-evaluate the enumerable since setting up the enumerable might be expensive.
In that case your approach looks reasonable. You do, however, have some issues with your imlementation:
You need to dispose enumerator after using it.
As pointed out by Ivan Stoev in comments, if the Bar() method attempts to evaluate the IEnumerable<T> more than once (e.g. by calling Any() then foreach (...)) then the results will be undefined because usedEnumerator will have been exhausted by the first enumeration.
To resolve these issues, I'd suggest modifying your API a little and create an extension method IfNonEmpty<T>(this IEnumerable<T> source, Action<IEnumerable<T>> func) that calls a specified method only if the sequence is nonempty, as shown below:
public static partial class EnumerableExtensions
{
public static bool IfNonEmpty<T>(this IEnumerable<T> source, Action<IEnumerable<T>> func)
{
if (source == null|| func == null)
throw new ArgumentNullException();
using (var enumerator = source.GetEnumerator())
{
if (!enumerator.MoveNext())
return false;
func(new UsedEnumerator<T>(enumerator));
return true;
}
}
class UsedEnumerator<T> : IEnumerable<T>
{
IEnumerator<T> usedEnumerator;
public UsedEnumerator(IEnumerator<T> usedEnumerator)
{
if (usedEnumerator == null)
throw new ArgumentNullException();
this.usedEnumerator = usedEnumerator;
}
public IEnumerator<T> GetEnumerator()
{
var localEnumerator = System.Threading.Interlocked.Exchange(ref usedEnumerator, null);
if (localEnumerator == null)
// An attempt has been made to enumerate usedEnumerator more than once;
// throw an exception since this is not allowed.
throw new InvalidOperationException();
yield return localEnumerator.Current;
while (localEnumerator.MoveNext())
{
yield return localEnumerator.Current;
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
}
Demo fiddle with unit tests here.

If you can change Bar then how about change it to TryBar that returns false when IEnumerable<T> was empty?
bool TryBar(IEnumerable<Foo> source)
{
var count = 0;
foreach (var x in source)
{
count++;
}
return count > 0;
}
If that doesn't work for you could always create your own IEnumerable<T> wrapper that caches values after they have been iterated once.

One improvement for your IsEmpty would be to check if source is ICollection<T>, and if it is, check .Count (also, dispose the enumerator):
bool IsEmpty<T>(ref IEnumerable<T> source)
{
if (source is ICollection<T> collection)
{
return collection.Count == 0;
}
var enumerator = source.GetEnumerator();
if (enumerator.MoveNext())
{
source = CreateIEnumerable(enumerator);
return false;
}
enumerator.Dispose();
return true;
IEnumerable<T> CreateIEnumerable(IEnumerator<T> usedEnumerator)
{
yield return usedEnumerator.Current;
while (usedEnumerator.MoveNext())
{
yield return usedEnumerator.Current;
}
usedEnumerator.Dispose();
}
}
This will work for arrays and lists.
I would, however, rework IsEmpty to return:
IEnumerable<T> NotEmpty<T>(IEnumerable<T> source)
{
if (source is ICollection<T> collection)
{
if (collection.Count == 0)
{
return null;
}
return source;
}
var enumerator = source.GetEnumerator();
if (enumerator.MoveNext())
{
return CreateIEnumerable(enumerator);
}
enumerator.Dispose();
return null;
IEnumerable<T> CreateIEnumerable(IEnumerator<T> usedEnumerator)
{
yield return usedEnumerator.Current;
while (usedEnumerator.MoveNext())
{
yield return usedEnumerator.Current;
}
usedEnumerator.Dispose();
}
}
Now, you would check if it returned null.

The accepted answer is probably the best approach but, based on, and I quote:
Convert sequence to list, check if that list it empty... and pass it to Bar. This have also limitation. Bar will need only first x items, so Foo will be doing unnecessary work...
Another take would be creating an IEnumerable<T> that partially caches the underlying enumeration. Something along the following lines:
interface IDisposableEnumerable<T>
:IEnumerable<T>, IDisposable
{
}
static class PartiallyCachedEnumerable
{
public static IDisposableEnumerable<T> Create<T>(
IEnumerable<T> source,
int cachedCount)
{
if (source == null)
throw new NullReferenceException(
nameof(source));
if (cachedCount < 1)
throw new ArgumentOutOfRangeException(
nameof(cachedCount));
return new partiallyCachedEnumerable<T>(
source, cachedCount);
}
private class partiallyCachedEnumerable<T>
: IDisposableEnumerable<T>
{
private readonly IEnumerator<T> enumerator;
private bool disposed;
private readonly List<T> cache;
private readonly bool hasMoreItems;
public partiallyCachedEnumerable(
IEnumerable<T> source,
int cachedCount)
{
Debug.Assert(source != null);
Debug.Assert(cachedCount > 0);
enumerator = source.GetEnumerator();
cache = new List<T>(cachedCount);
var count = 0;
while (enumerator.MoveNext() &&
count < cachedCount)
{
cache.Add(enumerator.Current);
count += 1;
}
hasMoreItems = !(count < cachedCount);
}
public void Dispose()
{
if (disposed)
return;
enumerator.Dispose();
disposed = true;
}
public IEnumerator<T> GetEnumerator()
{
foreach (var t in cache)
yield return t;
if (disposed)
yield break;
while (enumerator.MoveNext())
{
yield return enumerator.Current;
cache.Add(enumerator.Current)
}
Dispose();
}
IEnumerator IEnumerable.GetEnumerator()
=> GetEnumerator();
}
}

Generic method to map objects of different types

I would like to write Generic Method that would map List to new list, similar to JS's map method. I would then use this method like this:
var words= new List<string>() { "Kočnica", "druga beseda", "tretja", "izbirni", "vodno bitje" };
List<object> wordsMapped = words.Map(el => new { cela = el, končnica = el.Končnica(5) });
I know there's Select method which does the same thing but I need to write my own method. Right now I have this:
public static IEnumerable<object> SelectMy<T>(this IEnumerable<T> seznam, Predicate<T> predicate)
{
List<object> ret = new List<object>();
foreach (var el in seznam)
ret.Add(predicate(el));
return ret;
}
I also know I could use yield return but again I mustn't. I think the problem is with undeclared types and compiler can't figure out how it should map objects but I don't know how to fix that. All examples and tutorials I found map object of same types.

Linq's Select is the equivalent of the map() function in other functional languages. The mapping function would typically not be called Predicate, IMO - predicate would be a filter which could reduce the collection.
You can certainly wrap an extension method which would apply a projection to map input to output (either of which could be be anonymous types):
public static IEnumerable<TO> Map<TI, TO>(this IEnumerable<TI> seznam,
Func<TI, TO> mapper)
{
foreach (var item in seznam)
yield return mapper(item);
}
Which is equivalent to
public static IEnumerable<TO> Map<TI, TO>(this IEnumerable<TI> seznam,
Func<TI, TO> mapper)
{
return seznam.Select(mapper);
}
And if you don't want a strong return type, you can leave the output type as object
public static IEnumerable<object> Map<TI>(this IEnumerable<TI> seznam, Func<TI, object> mapper)
{
// Same implementation as above
And called like so:
var words = new List<string>() { "Kočnica", "druga beseda", "tretja", "izbirni", "vodno bitje" };
var wordsMapped = words.Map(el => new { cela = el, končnica = el.Končnica(5) });
Edit
If you enjoy the runtime thrills of dynamic languages, you could also use dynamic in place of object.
But using dynamic like this so this precludes the using the sugar of extension methods like Končnica - Končnica would either need to be a method on all of the types utilized, or be invoked explicitly, e.g.
static class MyExtensions
{
public static int Končnica(this int i, int someInt)
{
return i;
}
public static Foo Končnica(this Foo f, int someInt)
{
return f;
}
public static string Končnica(this string s, int someInt)
{
return s;
}
}
And then, provided all items in your input implemented Končnica you could invoke:
var things = new List<object>
{
"Kočnica", "druga beseda",
53,
new Foo()
};
var mappedThings = things.Map(el => new
{
cela = el,
končnica = MyExtensions.Končnica(el, 5)
// Or el.Končnica(5) IFF it is a method on all types, else run time errors ...
})
.ToList();

You can fix your code to work correctly like this:
public static IEnumerable<TResult> SelectMy<T, TResult>(this IEnumerable<T> seznam,
Func<T, TResult> mapping)
{
var ret = new List<TResult>();
foreach (var el in seznam)
{
ret.Add(mapping(el));
}
return ret;
}
Note that this is inefficient and problematic compared to typical Linq extensions, because it enumerates the entire input at once. If the input is an infinite series, you are in for a bad time.
It is possible to remedy this problem without the use of yield, but it would be somewhat lengthy. I think it would be ideal if you could tell us all why you are trying to do this task with two hands tied behind your back.
As a bonus, here is how you could implement this with the lazy evaluation benefits of yield without actually using yield. This should make it abundantly clear just how valuable yield is:
internal class SelectEnumerable<TIn, TResult> : IEnumerable<TResult>
{
private IEnumerable<TIn> BaseCollection { get; set; }
private Func<TIn, TResult> Mapping { get; set; }
internal SelectEnumerable(IEnumerable<TIn> baseCollection,
Func<TIn, TResult> mapping)
{
BaseCollection = baseCollection;
Mapping = mapping;
}
public IEnumerator<TResult> GetEnumerator()
{
return new SelectEnumerator<TIn, TResult>(BaseCollection.GetEnumerator(),
Mapping);
}
IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); }
}
internal class SelectEnumerator<TIn, TResult> : IEnumerator<TResult>
{
private IEnumerator<TIn> Enumerator { get; set; }
private Func<TIn, TResult> Mapping { get; set; }
internal SelectEnumerator(IEnumerator<TIn> enumerator,
Func<TIn, TResult> mapping)
{
Enumerator = enumerator;
Mapping = mapping;
}
public void Dispose() { Enumerator.Dispose(); }
public bool MoveNext() { return Enumerator.MoveNext(); }
public void Reset() { Enumerator.Reset(); }
public TResult Current { get { return Mapping(Enumerator.Current); } }
object IEnumerator.Current { get { return Current; } }
}
internal static class MyExtensions
{
internal static IEnumerable<TResult> MySelect<TIn, TResult>(
this IEnumerable<TIn> enumerable,
Func<TIn, TResult> mapping)
{
return new SelectEnumerable<TIn, TResult>(enumerable, mapping);
}
}

The problem with your code is that Predicate<T> is a delegate that returns a boolean, which you're then trying to add to a List<object>.
Using a Func<T,object> is probably what you're looking for.
That being said, that code smells bad:
Converting to object is less than useful
Passing a delegate that maps T to an anonymous type won't help - you'll still get an object back which has no useful properties.
You probably want to add a TResult generic type parameter to your method, and take a Func<T, TResult> as an argument.

Does Select() on a List lose track of the size of the collection?

In the following code, is the Select() method smart enough to keep the size of the list somewhere internally for the ToArray() method to be cheap?
List<Thing> bigList = someBigList;
var bigArray = bigList.Select(t => t.SomeField).ToArray();

That's easy to check, without looking at the implementation. Just create a class that implements IList<T>, and put a trace in the Count property:
class MyList<T> : IList<T>
{
private readonly IList<T> _list = new List<T>();
public IEnumerator<T> GetEnumerator()
{
return _list.GetEnumerator();
}
public void Add(T item)
{
_list.Add(item);
}
public void Clear()
{
_list.Clear();
}
public bool Contains(T item)
{
return _list.Contains(item);
}
public void CopyTo(T[] array, int arrayIndex)
{
_list.CopyTo(array, arrayIndex);
}
public bool Remove(T item)
{
return _list.Remove(item);
}
public int Count
{
get
{
Console.WriteLine ("Count accessed");
return _list.Count;
}
}
public bool IsReadOnly
{
get { return _list.IsReadOnly; }
}
public int IndexOf(T item)
{
return _list.IndexOf(item);
}
public void Insert(int index, T item)
{
_list.Insert(index, item);
}
public void RemoveAt(int index)
{
_list.RemoveAt(index);
}
public T this[int index]
{
get { return _list[index]; }
set { _list[index] = value; }
}
#region Implementation of IEnumerable
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
#endregion
}
If the Count property is accessed, this code should print "Count accessed":
var list = new MyList<int> { 1, 2, 3 };
var array = list.Select(x => x).ToArray();
But it doesn't print anything, so no, it doesn't keep track of the count. Of course there could be an optimization specific to List<T>, but it seems unlikely...

No, right now it does not (at least the .NET implementation). From the MS reference sources, Enumerable.ToArray is implemented as
public static TSource[] ToArray<TSource>(this IEnumerable<TSource> source) {
if (source == null) throw Error.ArgumentNull("source");
return new Buffer<TSource>(source).ToArray();
}
Buffer<TSource> creates a copy of the source sequence (in array form) on construction by iterating and resizing as necessary; it has a special "fast path" if source is an ICollection<TSource>, but the result of Enumerable.Select unsurprisingly does not implement that interface.
Be that as it may, apart from pure curiosity I don't think that this result means anything. For one, the implementation may change at any point in the future (even though a quick cost-benefit analysis won't find this likely). And in any case, you will suffer at most O(logN) reallocations. For small N the reallocations are not going to be noticeable. For large N, the amount of time spent on iterating over the collection is going to be O(N) and will therefore easily dominate.

When you apply Select operator to enumerable sequence, it creates one of following iterators:
WhereSelectArrayIterator
WhereSelectListIterator
WhereSelectEnumerableIterator
In case of List<T>, WhereSelectListIterator iterator is created. It uses list's iterator to iterate over the list and apply predicate and selector. This is a MoveNext method implementation:
while (this.enumerator.MoveNext())
{
TSource current = this.enumerator.Current;
if ((this.predicate == null) || this.predicate(current))
{
base.current = this.selector(current);
return true;
}
}
As you can see, it does not preserve information about number of items, which matched predicate, thus it does not know count of items in filtered sequence.

How to create IEnumerable<T> on which multiple enumerations are not possible?

When I enumerate over an IEnumerable twice Resharper complains about Possible multiple enumerations of IEnumerable. I know, in some case of DB-queries when you enumerate twice you get an exception.
I want to reproduce that behavior in tests. So, I basically want the following function to throw (because of multiple enumerations):
private void MultipleEnumerations(IEnumerable<string> enumerable)
{
MessageBox.Show(enumerable.Count().ToString());
MessageBox.Show(enumerable.Count().ToString());
}
What should I pass to it? All the Lists, Collections etc. are ok with multiple enumerations.
Even this kind of IEnumerable doesn't give an exception:
private IEnumerable<string> GetIEnumerable()
{
yield return "a";
yield return "b";
}
Thanks.

You probably just want a custom class:
public class OneShotEnumerable<T> : IEnumerable<T>
{
private readonly IEnumerable<T> _source;
private bool _shouldThrow = false;
public OneShotEnumerable(IEnumerable<T> source)
{
this._source = source;
}
public IEnumerator<T> GetEnumerator()
{
if (_shouldThrow) throw new InvalidOperationException();
_shouldThrow = true;
return _source.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}

Create your own class that implements IEnumerable<T> and throw an exception if GetEnumerator() is called twice (use a boolean instance field).
Alternatively, create an iterator that uses a flag field to ensure that it cannot be called twice (enumerating an iterator twice will execute the entire method twice).

The custom class, which I've copied from John Gietzen's answer (with a couple of corrections), could usefully be combined with an extension method to create a really simple way to do this.
public class OneShotEnumerable<T> : IEnumerable<T>
{
private readonly IEnumerable<T> source;
private bool shouldThrow = false;
public OneShotEnumerable(IEnumerable<T> source)
{
this.source = source;
}
public IEnumerator<T> GetEnumerator()
{
if (shouldThrow)
throw new InvalidOperationException("This enumerable has already been enumerated.");
shouldThrow = true;
return this.source.GetEnumerator();
}
}
public static clas OneShotEnumerableExtension
{
public static IEnumerable<T> SingleUse<T>(this IEnumerable<T> source)
{
#if (DEBUG)
return new OneShotEnumerableExtension(source);
#else
return source;
#endif
}
}
Then you can pass something to your previous method by simply doing
MultipleEnumerations(MyEnumerable.SingleUse());

C#: A good and efficient implementation of IEnumerable<T>.HasDuplicates

Does anyone have a good and efficient extension method for finding if a sequence of items has any duplicates?
Guess I could put return subjects.Distinct().Count() == subjects.Count() into an extension method, but kind of feels that there should be a better way. That method would have to count elements twice and sort out all the distict elements. A better implementation should return true on the first duplicate it finds. Any good suggestions?
I imagine the outline could be something like this:
public static bool HasDuplicates<T>(this IEnumerable<T> subjects)
{
return subjects.HasDuplicates(EqualityComparer<T>.Default);
}
public static bool HasDuplicates<T>(this IEnumerable<T> subjects, IEqualityComparer<T> comparer)
{
...
}
But not quite sure how a smart implementation of it would be...

public static bool HasDuplicates<T>(this IEnumerable<T> subjects)
{
return HasDuplicates(subjects, EqualityComparer<T>.Default);
}
public static bool HasDuplicates<T>(this IEnumerable<T> subjects, IEqualityComparer<T> comparer)
{
HashSet<T> set = new HashSet<T>(comparer);
foreach (T item in subjects)
{
if (!set.Add(item))
return true;
}
return false;
}

This is in production code. Works great:
public static bool HasDuplicates<T>(this IEnumerable<T> sequence, IEqualityComparer<T> comparer = null) {
var set = new HashSet<T>(comparer);
return !sequence.All(item => set.Add(item));
}

I think the simplest extension method is the following.
public static bool HasDuplicates<T>(this IEnumerable<T> enumerable) {
var hs = new HashSet<T>();
foreach ( var cur in enumerable ) {
if ( !hs.Add(cur) ) {
return false;
}
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Lazily partition sequence with LINQ - c#

Related

Calling method with IEnumerable<T> sequence as argument, if that sequence is not empty

Generic method to map objects of different types

Does Select() on a List lose track of the size of the collection?

How to create IEnumerable<T> on which multiple enumerations are not possible?

C#: A good and efficient implementation of IEnumerable<T>.HasDuplicates

Categories

Resources