Creating a collection with a function to obtain the next member - c#

I need to accumulate values into a collection, based on an arbitrary function. Each value is derived from calling a function on the previous value.
My current attempt:
public static T[] Aggregate<T>(this T source, Func<T, T> func)
{
var arr = new List<T> { };
var current = source;
while(current != null)
{
arr.Add(current);
current = func(current);
};
return arr.ToArray();
}
Is there a built-in .Net Framework function to do this?

This operation is usually called Unfold. There's no built-in version but it is implemented in FSharp.Core, so you could wrap that:
public static IEnumerable<T> Unfold<T, TState>(TState init, Func<TState, T> gen)
{
var liftF = new Converter<TState, Microsoft.FSharp.Core.FSharpOption<Tuple<T, TState>>>(x =>
{
var r = gen(x);
if (r == null)
{
return Microsoft.FSharp.Core.FSharpOption<Tuple<T, TState>>.None;
}
else
{
return Microsoft.FSharp.Core.FSharpOption<Tuple<T, TState>>.Some(Tuple.Create(r, x));
}
});
var ff = Microsoft.FSharp.Core.FSharpFunc<TState, Microsoft.FSharp.Core.FSharpOption<Tuple<T, TState>>>.FromConverter(liftF);
return Microsoft.FSharp.Collections.SeqModule.Unfold<TState, T>(ff, init);
}
public static IEnumerable<T> Unfold<T>(T source, Func<T, T> func)
{
return Unfold<T>(source, func);
}
however writing your own version would be simpler:
public static IEnumerable<T> Unfold<T>(T source, Func<T, T> func)
{
T current = source;
while(current != null)
{
yield return current;
current = func(current);
}
}

You are referring to an anamorphism as mentioned here linq-unfold-operator, which is the dual of a catamorphism.
Unfold is the dual of Aggregate. Aggregate exists in the .Net Framework; Unfold does not (for some unknown reason). Hence your confusion.
/// seeds: the initial data to unfold
/// stop: if stop(seed) is True, don't go any further
/// map: transform the seed into the final data
/// next: generate the next seed value from the current seed
public static IEnumerable<R> UnFold<T,R>(this IEnumerable<T> seeds, Predicate<T> stop,
Func<T,R> map, Func<T,IEnumerable<T>> next) {
foreach (var seed in seeds) {
if (!stop(seed)) {
yield return map(seed);
foreach (var val in next(seed).UnFold(stop, map, next))
yield return val;
}
}
}
Usage Example:
var parents = new[]{someType}.UnFold(t => t == null, t => t,
t => t.GetInterfaces().Concat(new[]{t.BaseType}))
.Distinct();

Related

How to build a sequence using a fluent interface?

I'm trying to using a fluent interface to build a collection, similar to this (simplified) example:
var a = StartWith(1).Add(2).Add(3).Add(4).ToArray();
/* a = int[] {1,2,3,4}; */
The best solution I can come up with add Add() as:
IEnumerable<T> Add<T>(this IEnumerable<T> coll, T item)
{
foreach(var t in coll) yield return t;
yield return item;
}
Which seems to add a lot of overhead that going to be repeated in each call.
IS there a better way?
UPDATE:
in my rush, I over-simplified the example, and left out an important requirement. The last item in the existing coll influences the next item. So, a slightly less simplified example:
var a = StartWith(1).Times10Plus(2).Times10Plus(3).Times10Plus(4).ToArray();
/* a = int[] {1,12,123,1234}; */
public static IEnumerable<T> StartWith<T>(T x)
{
yield return x;
}
static public IEnumerable<int> Times10Plus(this IEnumerable<int> coll, int item)
{
int last = 0;
foreach (var t in coll)
{
last = t;
yield return t;
}
yield return last * 10 + item;
}
A bit late to this party, but here are a couple ideas.
First, consider solving the more general problem:
public static IEnumerable<A> AggregateSequence<S, A>(
this IEnumerable<S> items,
A initial,
Func<A, R, A> f)
{
A accumulator = initial;
yield return accumulator;
foreach(S item in items)
{
accumulator = f(accumulator, item);
yield return accumulator;
}
}
And now your program is just new[]{2, 3, 4}.AggregateSequence(1,
(a, s) => a * 10 + s).ToArray()
However that lacks the "fluency" you want and it assumes that the same operation is applied to every element in the sequence.
You are right to note that deeply nested iterator blocks are problematic; they have quadratic performance in time and linear consumption of stack, both of which are bad.
Here's an entertaining way to implement your solution efficiently.
The problem is that you need both cheap access to the "right" end of the sequence, in order to do an operation on the most recently added element, but you also need cheap access to the left end of the sequence to enumerate it. Normal queues and stacks only have cheap access to one end.
Therefore: start by implementing an efficient immutable double-ended queue. This is a fascinating datatype; I have an implementation here using finger trees:
https://blogs.msdn.microsoft.com/ericlippert/2008/01/22/immutability-in-c-part-10-a-double-ended-queue/
https://blogs.msdn.microsoft.com/ericlippert/2008/02/12/immutability-in-c-part-eleven-a-working-double-ended-queue/
Once you have that, your operations are one-liners:
static IDeque<T> StartWith<T>(T t) => Deque<T>.Empty.EnqueueRight(t);
static IDeque<T> Op<T>(this IDeque<T> d, Func<T, T> f) => d.EnqueueRight(f(d.PeekRight()));
static IDeque<int> Times10Plus(this IDeque<int> d, int j) => d.Op(i => i * 10 + j);
Modify IDeque<T> and Deque<T> to implement IEnumerable<T> in the obvious way and you then get ToArray for free. Or do it as an extension method:
static IEnumerable<T> EnumerateFromLeft(this IDeque<T> d)
{
var c = d;
while (!c.IsEmpty)
{
yield return c.PeekLeft();
c = c.DequeueLeft();
}
}
You could do the following:
public static class MySequenceExtensions
{
public static IReadOnlyList<int> Times10Plus(
this IReadOnlyList<int> sequence,
int value) => Add(sequence,
value,
v => sequence[sequence.Count - 1] * 10 + v);
public static IReadOnlyList<T> Starts<T>(this T first)
=> new MySequence<T>(first);
public static IReadOnlyList<T> Add<T>(
this IReadOnlyList<T> sequence,
T item,
Func<T, T> func)
{
var mySequence = sequence as MySequence<T> ??
new MySequence<T>(sequence);
return mySequence.AddItem(item, func);
}
private class MySequence<T>: IReadOnlyList<T>
{
private readonly List<T> innerList;
public MySequence(T item)
{
innerList = new List<T>();
innerList.Add(item);
}
public MySequence(IEnumerable<T> items)
{
innerList = new List<T>(items);
}
public T this[int index] => innerList[index];
public int Count => innerList.Count;
public MySequence<T> AddItem(T item, Func<T, T> func)
{
Debug.Assert(innerList.Count > 0);
innerList.Add(func(item));
return this;
}
public IEnumerator<T> GetEnumerator() => innerList.GetEnumerator();
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}
}
Note that I'm using IReadOnlyList to make it possible to index into the list in a performant way and be able to get the last element if needed. If you need a lazy enumeration then I think you are stuck with your original idea.
And sure enough, the following:
var a = 1.Starts().Times10Plus(2).Times10Plus(3).Times10Plus(4).ToArray();
Produces the expected result ({1, 12, 123, 1234}) with, what I think is, reasonable performance.
You can do like this:
public interface ISequence
{
ISequenceOp StartWith(int i);
}
public interface ISequenceOp
{
ISequenceOp Times10Plus(int i);
int[] ToArray();
}
public class Sequence : ISequence
{
public ISequenceOp StartWith(int i)
{
return new SequenceOp(i);
}
}
public class SequenceOp : ISequenceOp
{
public List<int> Sequence { get; set; }
public SequenceOp(int startValue)
{
Sequence = new List<int> { startValue };
}
public ISequenceOp Times10Plus(int i)
{
Sequence.Add(Sequence.Last() * 10 + i);
return this;
}
public int[] ToArray()
{
return Sequence.ToArray();
}
}
An then just:
var x = new Sequence();
var a = x.StartWith(1).Times10Plus(2).Times10Plus(3).Times10Plus(4).ToArray();

Generic method to map objects of different types

I would like to write Generic Method that would map List to new list, similar to JS's map method. I would then use this method like this:
var words= new List<string>() { "Kočnica", "druga beseda", "tretja", "izbirni", "vodno bitje" };
List<object> wordsMapped = words.Map(el => new { cela = el, končnica = el.Končnica(5) });
I know there's Select method which does the same thing but I need to write my own method. Right now I have this:
public static IEnumerable<object> SelectMy<T>(this IEnumerable<T> seznam, Predicate<T> predicate)
{
List<object> ret = new List<object>();
foreach (var el in seznam)
ret.Add(predicate(el));
return ret;
}
I also know I could use yield return but again I mustn't. I think the problem is with undeclared types and compiler can't figure out how it should map objects but I don't know how to fix that. All examples and tutorials I found map object of same types.
Linq's Select is the equivalent of the map() function in other functional languages. The mapping function would typically not be called Predicate, IMO - predicate would be a filter which could reduce the collection.
You can certainly wrap an extension method which would apply a projection to map input to output (either of which could be be anonymous types):
public static IEnumerable<TO> Map<TI, TO>(this IEnumerable<TI> seznam,
Func<TI, TO> mapper)
{
foreach (var item in seznam)
yield return mapper(item);
}
Which is equivalent to
public static IEnumerable<TO> Map<TI, TO>(this IEnumerable<TI> seznam,
Func<TI, TO> mapper)
{
return seznam.Select(mapper);
}
And if you don't want a strong return type, you can leave the output type as object
public static IEnumerable<object> Map<TI>(this IEnumerable<TI> seznam, Func<TI, object> mapper)
{
// Same implementation as above
And called like so:
var words = new List<string>() { "Kočnica", "druga beseda", "tretja", "izbirni", "vodno bitje" };
var wordsMapped = words.Map(el => new { cela = el, končnica = el.Končnica(5) });
Edit
If you enjoy the runtime thrills of dynamic languages, you could also use dynamic in place of object.
But using dynamic like this so this precludes the using the sugar of extension methods like Končnica - Končnica would either need to be a method on all of the types utilized, or be invoked explicitly, e.g.
static class MyExtensions
{
public static int Končnica(this int i, int someInt)
{
return i;
}
public static Foo Končnica(this Foo f, int someInt)
{
return f;
}
public static string Končnica(this string s, int someInt)
{
return s;
}
}
And then, provided all items in your input implemented Končnica you could invoke:
var things = new List<object>
{
"Kočnica", "druga beseda",
53,
new Foo()
};
var mappedThings = things.Map(el => new
{
cela = el,
končnica = MyExtensions.Končnica(el, 5)
// Or el.Končnica(5) IFF it is a method on all types, else run time errors ...
})
.ToList();
You can fix your code to work correctly like this:
public static IEnumerable<TResult> SelectMy<T, TResult>(this IEnumerable<T> seznam,
Func<T, TResult> mapping)
{
var ret = new List<TResult>();
foreach (var el in seznam)
{
ret.Add(mapping(el));
}
return ret;
}
Note that this is inefficient and problematic compared to typical Linq extensions, because it enumerates the entire input at once. If the input is an infinite series, you are in for a bad time.
It is possible to remedy this problem without the use of yield, but it would be somewhat lengthy. I think it would be ideal if you could tell us all why you are trying to do this task with two hands tied behind your back.
As a bonus, here is how you could implement this with the lazy evaluation benefits of yield without actually using yield. This should make it abundantly clear just how valuable yield is:
internal class SelectEnumerable<TIn, TResult> : IEnumerable<TResult>
{
private IEnumerable<TIn> BaseCollection { get; set; }
private Func<TIn, TResult> Mapping { get; set; }
internal SelectEnumerable(IEnumerable<TIn> baseCollection,
Func<TIn, TResult> mapping)
{
BaseCollection = baseCollection;
Mapping = mapping;
}
public IEnumerator<TResult> GetEnumerator()
{
return new SelectEnumerator<TIn, TResult>(BaseCollection.GetEnumerator(),
Mapping);
}
IEnumerator IEnumerable.GetEnumerator() { return GetEnumerator(); }
}
internal class SelectEnumerator<TIn, TResult> : IEnumerator<TResult>
{
private IEnumerator<TIn> Enumerator { get; set; }
private Func<TIn, TResult> Mapping { get; set; }
internal SelectEnumerator(IEnumerator<TIn> enumerator,
Func<TIn, TResult> mapping)
{
Enumerator = enumerator;
Mapping = mapping;
}
public void Dispose() { Enumerator.Dispose(); }
public bool MoveNext() { return Enumerator.MoveNext(); }
public void Reset() { Enumerator.Reset(); }
public TResult Current { get { return Mapping(Enumerator.Current); } }
object IEnumerator.Current { get { return Current; } }
}
internal static class MyExtensions
{
internal static IEnumerable<TResult> MySelect<TIn, TResult>(
this IEnumerable<TIn> enumerable,
Func<TIn, TResult> mapping)
{
return new SelectEnumerable<TIn, TResult>(enumerable, mapping);
}
}
The problem with your code is that Predicate<T> is a delegate that returns a boolean, which you're then trying to add to a List<object>.
Using a Func<T,object> is probably what you're looking for.
That being said, that code smells bad:
Converting to object is less than useful
Passing a delegate that maps T to an anonymous type won't help - you'll still get an object back which has no useful properties.
You probably want to add a TResult generic type parameter to your method, and take a Func<T, TResult> as an argument.

Lazily partition sequence with LINQ

I have the following extension method to find an element within a sequence, and then return two IEnumerable<T>s: one containing all the elements before that element, and one containing the element and everything that follows. I would prefer if the method were lazy, but I haven't figured out a way to do that. Can anyone come up with a solution?
public static PartitionTuple<T> Partition<T>(this IEnumerable<T> sequence, Func<T, bool> partition)
{
var a = sequence.ToArray();
return new PartitionTuple<T>
{
Before = a.TakeWhile(v => !partition(v)),
After = a.SkipWhile(v => !partition(v))
};
}
Doing sequence.ToArray() immediately defeats the laziness requirement. However, without that line, an expensive-to-iterate sequence may be iterated over twice. And, depending on what the calling code does, many more times.
You can use the Lazy object to ensure that the source sequence isn't converted to an array until one of the two partitions is iterated:
public static PartitionTuple<T> Partition<T>(
this IEnumerable<T> sequence, Func<T, bool> partition)
{
var lazy = new Lazy<IEnumerable<T>>(() => sequence.ToArray());
return new PartitionTuple<T>
{
Before = lazy.MapLazySequence(s => s.TakeWhile(v => !partition(v))),
After = lazy.MapLazySequence(s => s.SkipWhile(v => !partition(v)))
};
}
We'll use this method to defer evaluating the lazy until the sequence itself is iterated:
public static IEnumerable<TResult> MapLazySequence<TSource, TResult>(
this Lazy<IEnumerable<TSource>> lazy,
Func<IEnumerable<TSource>, IEnumerable<TResult>> filter)
{
foreach (var item in filter(lazy.Value))
yield return item;
}
This is an interesting problem and to get it right, you have to know what "right" is. For the semantics of the operation, I think that this definition makes sense:
The source sequence is only enumerated once even though the resulting sequences are enumerated several times.
The source sequence isn't enumerated until one of the results is enumerated.
Each of the results should be possible to enumerate independently.
If the source sequence changes, it is undefined what will happen.
I'm not entirely sure I got the handling of the matching object right, but I hope you get the idea. I'm deferring a lot of the work to the PartitionTuple<T> class to be able to be lazy.
public class PartitionTuple<T>
{
IEnumerable<T> source;
IList<T> before, after;
Func<T, bool> partition;
public PartitionTuple(IEnumerable<T> source, Func<T, bool> partition)
{
this.source = source;
this.partition = partition;
}
private void EnsureMaterialized()
{
if(before == null)
{
before = new List<T>();
after = new List<T>();
using(var enumerator = source.GetEnumerator())
{
while(enumerator.MoveNext() && !partition(enumerator.Current))
{
before.Add(enumerator.Current);
}
while(!partition(enumerator.Current) && enumerator.MoveNext());
while(enumerator.MoveNext())
{
after.Add(enumerator.Current);
}
}
}
}
public IEnumerable<T> Before
{
get
{
EnsureMaterialized();
return before;
}
}
public IEnumerable<T> After
{
get
{
EnsureMaterialized();
return after;
}
}
}
public static class Extensions
{
public static PartitionTuple<T> Partition<T>(this IEnumerable<T> sequence, Func<T, bool> partition)
{
return new PartitionTuple<T>(sequence, partition);
}
}
Here's a generic solution that will memoize any IEnumerable<T> to ensure it's only iterated once, without forcing the whole thing to iterate:
public class MemoizedEnumerable<T> : IEnumerable<T>, IDisposable
{
private readonly IEnumerator<T> _childEnumerator;
private readonly List<T> _itemCache = new List<T>();
public MemoizedEnumerable(IEnumerable<T> enumerableToMemoize)
{
_childEnumerator = enumerableToMemoize.GetEnumerator();
}
public IEnumerator<T> GetEnumerator()
{
return _itemCache.Concat(EnumerateOnce()).GetEnumerator();
}
public void Dispose()
{
_childEnumerator.Dispose();
}
private IEnumerable<T> EnumerateOnce()
{
while (_childEnumerator.MoveNext())
{
_itemCache.Add(_childEnumerator.Current);
yield return _childEnumerator.Current;
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
public static class EnumerableExtensions
{
public static IEnumerable<T> Memoize<T>(this IEnumerable<T> enumerable)
{
return new MemoizedEnumerable<T>(enumerable);
}
}
To use it for your partitioning problem, do this:
var memoized = sequence.Memoize();
return new PartitionTuple<T>
{
Before = memoized.TakeWhile(v => !partition(v)),
After = memoized.SkipWhile(v => !partition(v))
};
This will only iterate sequence a maximum of one time.
Generally, you just return some object of your custom class, which implements IEnumerable<T> but also provides the results on enumeration demand only.
You can also implement IQueryable<T> (inherits IEnumerable) instead of IEnumerable<T>, but it's rather needed for building reach functionality with queries like the one, which linq for sql provides: database query being executed only on the final enumeration request.

Can I cache partially-executed LINQ queries?

I have the following code:
IEnumerable<KeyValuePair<T, double>> items =
sequence.Select(item => new KeyValuePair<T, double>(item, weight(item)));
if (items.Any(pair => pair.Value<0))
throw new ArgumentException("Item weights cannot be less than zero.");
double sum = items.Sum(pair => pair.Value);
foreach (KeyValuePair<T, double> pair in items) {...}
Where weight is a Func<T, double>.
The problem is I want weight to be executed as few times as possible. This means it should be executed at most once for each item. I could achieve this by saving it to an array. However, if any weight returns a negative value, I don't want to continue execution.
Is there any way to accomplish this easily within the LINQ framework?
Sure, that's totally doable:
public static Func<A, double> ThrowIfNegative<A, double>(this Func<A, double> f)
{
return a=>
{
double r = f(a);
// if r is NaN then this will throw.
if ( !(r >= 0.0) )
throw new Exception();
return r;
};
}
public static Func<A, R> Memoize<A, R>(this Func<A, R> f)
{
var d = new Dictionary<A, R>();
return a=>
{
R r;
if (!d.TryGetValue(a, out r))
{
r = f(a);
d.Add(a, r);
}
return r;
};
}
And now...
Func<T, double> weight = whatever;
weight = weight.ThrowIfNegative().Memoize();
and you're done.
One way is to move the exception into the weight function, or at least simulate doing so, by doing something like:
Func<T, double> weightWithCheck = i =>
{
double result = weight(i);
if (result < 0)
{
throw new ArgumentException("Item weights cannot be less than zero.");
}
return result;
};
IEnumerable<KeyValuePair<T, double>> items =
sequence.Select(item => new KeyValuePair<T, double>(item, weightWithCheck(item)));
double sum = items.Sum(pair => pair.Value);
By this point, if there is an exception to be had, you should have it. You do have to enumerate items before you can be assured of getting the exception, though, but once you get it, you will not call weight again.
Both answers are good (where to throw the exception, and memoizing the function).
But your real problem is that your LINQ expression is evaluated every time you use it, unless you force it to evaluate and store as a List (or similar). Just change this:
sequence.Select(item => new KeyValuePair<T, double>(item, weight(item)));
To this:
sequence.Select(item => new KeyValuePair<T, double>(item, weight(item))).ToList();
You could possibly do it with a foreach loop. Here is a way to do it in one statement:
IEnumerable<KeyValuePair<T, double>> items = sequence
.Select(item => new KeyValuePair<T, double>(item, weight(item)))
.Select(kvp =>
{
if (kvp.Value < 0)
throw new ArgumentException("Item weights cannot be less than zero.");
else
return kvp;
}
);
No, there is nothing already IN the LINQ framework to do this, but you could surely write up your own methods and invoke them from the linq query (As has already been shown by many).
Personally, I would either ToList the first query or use Eric's suggestion.
Instead of a functional memoization suggested by other answers, you can also employ a memoization for the whole data sequence:
var items = sequence
.Select(item => new KeyValuePair<T, double>(item, weight(item)))
.Memoize();
(Note a call to Memoize() method at the end of the expression above)
A nice property of data memoization is that it represents a drop-in replacement for ToList() or ToArray() approaches.
The fully featured implementation is pretty involved though:
using System;
using System.Collections;
using System.Collections.Generic;
using System.Diagnostics;
static class MemoizationExtensions
{
/// <summary>
/// Memoize all elements of a sequence, e.g. ensure that every element of a sequence is retrieved only once.
/// </summary>
/// <remarks>
/// The resulting sequence is not thread safe.
/// </remarks>
/// <typeparam name="T">The type of the elements of source.</typeparam>
/// <param name="source">The source sequence.</param>
/// <returns>The sequence that fully replicates the source with all elements being memoized.</returns>
public static IEnumerable<T> Memoize<T>(this IEnumerable<T> source) => Memoize(source, false);
/// <summary>
/// Memoize all elements of a sequence, e.g. ensure that every element of a sequence is retrieved only once.
/// </summary>
/// <typeparam name="T">The type of the elements of source.</typeparam>
/// <param name="source">The source sequence.</param>
/// <param name="isThreadSafe">Indicates whether resulting sequence is thread safe.</param>
/// <returns>The sequence that fully replicates the source with all elements being memoized.</returns>
public static IEnumerable<T> Memoize<T>(this IEnumerable<T> source, bool isThreadSafe)
{
switch (source)
{
case null:
return null;
case CachedEnumerable<T> existingCachedEnumerable:
if (!isThreadSafe || existingCachedEnumerable is ThreadSafeCachedEnumerable<T>)
{
// The source is already memoized with compatible parameters.
return existingCachedEnumerable;
}
break;
case IList<T> _:
case IReadOnlyList<T> _:
case string _:
// Given source types are intrinsically memoized by their nature.
return source;
}
if (isThreadSafe)
return new ThreadSafeCachedEnumerable<T>(source);
else
return new CachedEnumerable<T>(source);
}
class CachedEnumerable<T> : IEnumerable<T>, IReadOnlyList<T>
{
public CachedEnumerable(IEnumerable<T> source)
{
_Source = source;
}
[DebuggerBrowsable(DebuggerBrowsableState.Never)]
IEnumerable<T> _Source;
[DebuggerBrowsable(DebuggerBrowsableState.Never)]
IEnumerator<T> _SourceEnumerator;
[DebuggerBrowsable(DebuggerBrowsableState.Never)]
protected readonly IList<T> Cache = new List<T>();
public virtual int Count
{
get
{
while (_TryCacheElementNoLock()) ;
return Cache.Count;
}
}
bool _TryCacheElementNoLock()
{
if (_SourceEnumerator == null && _Source != null)
{
_SourceEnumerator = _Source.GetEnumerator();
_Source = null;
}
if (_SourceEnumerator == null)
{
// Source enumerator already reached the end.
return false;
}
else if (_SourceEnumerator.MoveNext())
{
Cache.Add(_SourceEnumerator.Current);
return true;
}
else
{
// Source enumerator has reached the end, so it is no longer needed.
_SourceEnumerator.Dispose();
_SourceEnumerator = null;
return false;
}
}
public virtual T this[int index]
{
get
{
_EnsureItemIsCachedNoLock(index);
return Cache[index];
}
}
public IEnumerator<T> GetEnumerator() => new CachedEnumerator<T>(this);
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
internal virtual bool EnsureItemIsCached(int index) => _EnsureItemIsCachedNoLock(index);
bool _EnsureItemIsCachedNoLock(int index)
{
while (Cache.Count <= index)
{
if (!_TryCacheElementNoLock())
return false;
}
return true;
}
internal virtual T GetCacheItem(int index) => Cache[index];
}
sealed class ThreadSafeCachedEnumerable<T> : CachedEnumerable<T>
{
public ThreadSafeCachedEnumerable(IEnumerable<T> source) :
base(source)
{
}
public override int Count
{
get
{
lock (Cache)
return base.Count;
}
}
public override T this[int index]
{
get
{
lock (Cache)
return base[index];
}
}
internal override bool EnsureItemIsCached(int index)
{
lock (Cache)
return base.EnsureItemIsCached(index);
}
internal override T GetCacheItem(int index)
{
lock (Cache)
return base.GetCacheItem(index);
}
}
sealed class CachedEnumerator<T> : IEnumerator<T>
{
CachedEnumerable<T> _CachedEnumerable;
const int InitialIndex = -1;
const int EofIndex = -2;
int _Index = InitialIndex;
public CachedEnumerator(CachedEnumerable<T> cachedEnumerable)
{
_CachedEnumerable = cachedEnumerable;
}
public T Current
{
get
{
var cachedEnumerable = _CachedEnumerable;
if (cachedEnumerable == null)
throw new InvalidOperationException();
var index = _Index;
if (index < 0)
throw new InvalidOperationException();
return cachedEnumerable.GetCacheItem(index);
}
}
object IEnumerator.Current => Current;
public void Dispose()
{
_CachedEnumerable = null;
}
public bool MoveNext()
{
var cachedEnumerable = _CachedEnumerable;
if (cachedEnumerable == null)
{
// Disposed.
return false;
}
if (_Index == EofIndex)
return false;
_Index++;
if (!cachedEnumerable.EnsureItemIsCached(_Index))
{
_Index = EofIndex;
return false;
}
else
{
return true;
}
}
public void Reset()
{
_Index = InitialIndex;
}
}
}
More info and a readily available NuGet package: https://github.com/gapotchenko/Gapotchenko.FX/tree/master/Source/Gapotchenko.FX.Linq#memoize

Distinct() with lambda?

Right, so I have an enumerable and wish to get distinct values from it.
Using System.Linq, there's, of course, an extension method called Distinct. In the simple case, it can be used with no parameters, like:
var distinctValues = myStringList.Distinct();
Well and good, but if I have an enumerable of objects for which I need to specify equality, the only available overload is:
var distinctValues = myCustomerList.Distinct(someEqualityComparer);
The equality comparer argument must be an instance of IEqualityComparer<T>. I can do this, of course, but it's somewhat verbose and, well, cludgy.
What I would have expected is an overload that would take a lambda, say a Func<T, T, bool>:
var distinctValues = myCustomerList.Distinct((c1, c2) => c1.CustomerId == c2.CustomerId);
Anyone know if some such extension exists, or some equivalent workaround? Or am I missing something?
Alternatively, is there a way of specifying an IEqualityComparer inline (embarrass me)?
Update
I found a reply by Anders Hejlsberg to a post in an MSDN forum on this subject. He says:
The problem you're going to run into is that when two objects compare
equal they must have the same GetHashCode return value (or else the
hash table used internally by Distinct will not function correctly).
We use IEqualityComparer because it packages compatible
implementations of Equals and GetHashCode into a single interface.
I suppose that makes sense.
IEnumerable<Customer> filteredList = originalList
.GroupBy(customer => customer.CustomerId)
.Select(group => group.First());
It looks to me like you want DistinctBy from MoreLINQ. You can then write:
var distinctValues = myCustomerList.DistinctBy(c => c.CustomerId);
Here's a cut-down version of DistinctBy (no nullity checking and no option to specify your own key comparer):
public static IEnumerable<TSource> DistinctBy<TSource, TKey>
(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
HashSet<TKey> knownKeys = new HashSet<TKey>();
foreach (TSource element in source)
{
if (knownKeys.Add(keySelector(element)))
{
yield return element;
}
}
}
To Wrap things up . I think most of the people which came here like me want the simplest solution possible without using any libraries and with best possible performance.
(The accepted group by method for me i think is an overkill in terms of performance. )
Here is a simple extension method using the IEqualityComparer interface which works also for null values.
Usage:
var filtered = taskList.DistinctBy(t => t.TaskExternalId).ToArray();
Extension Method Code
public static class LinqExtensions
{
public static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> items, Func<T, TKey> property)
{
GeneralPropertyComparer<T, TKey> comparer = new GeneralPropertyComparer<T,TKey>(property);
return items.Distinct(comparer);
}
}
public class GeneralPropertyComparer<T,TKey> : IEqualityComparer<T>
{
private Func<T, TKey> expr { get; set; }
public GeneralPropertyComparer (Func<T, TKey> expr)
{
this.expr = expr;
}
public bool Equals(T left, T right)
{
var leftProp = expr.Invoke(left);
var rightProp = expr.Invoke(right);
if (leftProp == null && rightProp == null)
return true;
else if (leftProp == null ^ rightProp == null)
return false;
else
return leftProp.Equals(rightProp);
}
public int GetHashCode(T obj)
{
var prop = expr.Invoke(obj);
return (prop==null)? 0:prop.GetHashCode();
}
}
Shorthand solution
myCustomerList.GroupBy(c => c.CustomerId, (key, c) => c.FirstOrDefault());
No there is no such extension method overload for this. I've found this frustrating myself in the past and as such I usually write a helper class to deal with this problem. The goal is to convert a Func<T,T,bool> to IEqualityComparer<T,T>.
Example
public class EqualityFactory {
private sealed class Impl<T> : IEqualityComparer<T,T> {
private Func<T,T,bool> m_del;
private IEqualityComparer<T> m_comp;
public Impl(Func<T,T,bool> del) {
m_del = del;
m_comp = EqualityComparer<T>.Default;
}
public bool Equals(T left, T right) {
return m_del(left, right);
}
public int GetHashCode(T value) {
return m_comp.GetHashCode(value);
}
}
public static IEqualityComparer<T,T> Create<T>(Func<T,T,bool> del) {
return new Impl<T>(del);
}
}
This allows you to write the following
var distinctValues = myCustomerList
.Distinct(EqualityFactory.Create((c1, c2) => c1.CustomerId == c2.CustomerId));
Here's a simple extension method that does what I need...
public static class EnumerableExtensions
{
public static IEnumerable<TKey> Distinct<T, TKey>(this IEnumerable<T> source, Func<T, TKey> selector)
{
return source.GroupBy(selector).Select(x => x.Key);
}
}
It's a shame they didn't bake a distinct method like this into the framework, but hey ho.
This will do what you want but I don't know about performance:
var distinctValues =
from cust in myCustomerList
group cust by cust.CustomerId
into gcust
select gcust.First();
At least it's not verbose.
From .NET 6 or later, there is a new build-in method Enumerable.DistinctBy to achieve this.
var distinctValues = myCustomerList.DistinctBy(c => c.CustomerId);
// With IEqualityComparer
var distinctValues = myCustomerList.DistinctBy(c => c.CustomerId, someEqualityComparer);
Something I have used which worked well for me.
/// <summary>
/// A class to wrap the IEqualityComparer interface into matching functions for simple implementation
/// </summary>
/// <typeparam name="T">The type of object to be compared</typeparam>
public class MyIEqualityComparer<T> : IEqualityComparer<T>
{
/// <summary>
/// Create a new comparer based on the given Equals and GetHashCode methods
/// </summary>
/// <param name="equals">The method to compute equals of two T instances</param>
/// <param name="getHashCode">The method to compute a hashcode for a T instance</param>
public MyIEqualityComparer(Func<T, T, bool> equals, Func<T, int> getHashCode)
{
if (equals == null)
throw new ArgumentNullException("equals", "Equals parameter is required for all MyIEqualityComparer instances");
EqualsMethod = equals;
GetHashCodeMethod = getHashCode;
}
/// <summary>
/// Gets the method used to compute equals
/// </summary>
public Func<T, T, bool> EqualsMethod { get; private set; }
/// <summary>
/// Gets the method used to compute a hash code
/// </summary>
public Func<T, int> GetHashCodeMethod { get; private set; }
bool IEqualityComparer<T>.Equals(T x, T y)
{
return EqualsMethod(x, y);
}
int IEqualityComparer<T>.GetHashCode(T obj)
{
if (GetHashCodeMethod == null)
return obj.GetHashCode();
return GetHashCodeMethod(obj);
}
}
All solutions I've seen here rely on selecting an already comparable field. If one needs to compare in a different way, though, this solution here seems to work generally, for something like:
somedoubles.Distinct(new LambdaComparer<double>((x, y) => Math.Abs(x - y) < double.Epsilon)).Count()
Take another way:
var distinctValues = myCustomerList.
Select(x => x._myCaustomerProperty).Distinct();
The sequence return distinct elements compare them by property '_myCaustomerProperty' .
You can use LambdaEqualityComparer:
var distinctValues
= myCustomerList.Distinct(new LambdaEqualityComparer<OurType>((c1, c2) => c1.CustomerId == c2.CustomerId));
public class LambdaEqualityComparer<T> : IEqualityComparer<T>
{
public LambdaEqualityComparer(Func<T, T, bool> equalsFunction)
{
_equalsFunction = equalsFunction;
}
public bool Equals(T x, T y)
{
return _equalsFunction(x, y);
}
public int GetHashCode(T obj)
{
return obj.GetHashCode();
}
private readonly Func<T, T, bool> _equalsFunction;
}
You can use InlineComparer
public class InlineComparer<T> : IEqualityComparer<T>
{
//private readonly Func<T, T, bool> equalsMethod;
//private readonly Func<T, int> getHashCodeMethod;
public Func<T, T, bool> EqualsMethod { get; private set; }
public Func<T, int> GetHashCodeMethod { get; private set; }
public InlineComparer(Func<T, T, bool> equals, Func<T, int> hashCode)
{
if (equals == null) throw new ArgumentNullException("equals", "Equals parameter is required for all InlineComparer instances");
EqualsMethod = equals;
GetHashCodeMethod = hashCode;
}
public bool Equals(T x, T y)
{
return EqualsMethod(x, y);
}
public int GetHashCode(T obj)
{
if (GetHashCodeMethod == null) return obj.GetHashCode();
return GetHashCodeMethod(obj);
}
}
Usage sample:
var comparer = new InlineComparer<DetalleLog>((i1, i2) => i1.PeticionEV == i2.PeticionEV && i1.Etiqueta == i2.Etiqueta, i => i.PeticionEV.GetHashCode() + i.Etiqueta.GetHashCode());
var peticionesEV = listaLogs.Distinct(comparer).ToList();
Assert.IsNotNull(peticionesEV);
Assert.AreNotEqual(0, peticionesEV.Count);
Source:
https://stackoverflow.com/a/5969691/206730
Using IEqualityComparer for Union
Can I specify my explicit type comparator inline?
If Distinct() doesn't produce unique results, try this one:
var filteredWC = tblWorkCenter.GroupBy(cc => cc.WCID_I).Select(grp => grp.First()).Select(cc => new Model.WorkCenter { WCID = cc.WCID_I }).OrderBy(cc => cc.WCID);
ObservableCollection<Model.WorkCenter> WorkCenter = new ObservableCollection<Model.WorkCenter>(filteredWC);
A tricky way to do this is use Aggregate() extension, using a dictionary as accumulator with the key-property values as keys:
var customers = new List<Customer>();
var distincts = customers.Aggregate(new Dictionary<int, Customer>(),
(d, e) => { d[e.CustomerId] = e; return d; },
d => d.Values);
And a GroupBy-style solution is using ToLookup():
var distincts = customers.ToLookup(c => c.CustomerId).Select(g => g.First());
IEnumerable lambda extension:
public static class ListExtensions
{
public static IEnumerable<T> Distinct<T>(this IEnumerable<T> list, Func<T, int> hashCode)
{
Dictionary<int, T> hashCodeDic = new Dictionary<int, T>();
list.ToList().ForEach(t =>
{
var key = hashCode(t);
if (!hashCodeDic.ContainsKey(key))
hashCodeDic.Add(key, t);
});
return hashCodeDic.Select(kvp => kvp.Value);
}
}
Usage:
class Employee
{
public string Name { get; set; }
public int EmployeeID { get; set; }
}
//Add 5 employees to List
List<Employee> lst = new List<Employee>();
Employee e = new Employee { Name = "Shantanu", EmployeeID = 123456 };
lst.Add(e);
lst.Add(e);
Employee e1 = new Employee { Name = "Adam Warren", EmployeeID = 823456 };
lst.Add(e1);
//Add a space in the Name
Employee e2 = new Employee { Name = "Adam Warren", EmployeeID = 823456 };
lst.Add(e2);
//Name is different case
Employee e3 = new Employee { Name = "adam warren", EmployeeID = 823456 };
lst.Add(e3);
//Distinct (without IEqalityComparer<T>) - Returns 4 employees
var lstDistinct1 = lst.Distinct();
//Lambda Extension - Return 2 employees
var lstDistinct = lst.Distinct(employee => employee.EmployeeID.GetHashCode() ^ employee.Name.ToUpper().Replace(" ", "").GetHashCode());
The Microsoft System.Interactive package has a version of Distinct that takes a key selector lambda. This is effectively the same as Jon Skeet's solution, but it may be helpful for people to know, and to check out the rest of the library.
Here's how you can do it:
public static class Extensions
{
public static IEnumerable<T> MyDistinct<T, V>(this IEnumerable<T> query,
Func<T, V> f,
Func<IGrouping<V,T>,T> h=null)
{
if (h==null) h=(x => x.First());
return query.GroupBy(f).Select(h);
}
}
This method allows you to use it by specifying one parameter like .MyDistinct(d => d.Name), but it also allows you to specify a having condition as a second parameter like so:
var myQuery = (from x in _myObject select x).MyDistinct(d => d.Name,
x => x.FirstOrDefault(y=>y.Name.Contains("1") || y.Name.Contains("2"))
);
N.B. This would also allow you to specify other functions like for example .LastOrDefault(...) as well.
If you want to expose just the condition, you can have it even simpler by implementing it as:
public static IEnumerable<T> MyDistinct2<T, V>(this IEnumerable<T> query,
Func<T, V> f,
Func<T,bool> h=null
)
{
if (h == null) h = (y => true);
return query.GroupBy(f).Select(x=>x.FirstOrDefault(h));
}
In this case, the query would just look like:
var myQuery2 = (from x in _myObject select x).MyDistinct2(d => d.Name,
y => y.Name.Contains("1") || y.Name.Contains("2")
);
N.B. Here, the expression is simpler, but note .MyDistinct2 uses .FirstOrDefault(...) implicitly.
Note: The examples above are using the following demo class
class MyObject
{
public string Name;
public string Code;
}
private MyObject[] _myObject = {
new MyObject() { Name = "Test1", Code = "T"},
new MyObject() { Name = "Test2", Code = "Q"},
new MyObject() { Name = "Test2", Code = "T"},
new MyObject() { Name = "Test5", Code = "Q"}
};
I'm assuming you have an IEnumerable<T>, and in your example delegate, you would like c1 and c2 to be referring to two elements in this list?
I believe you could achieve this with a self join:
var distinctResults = from c1 in myList
join c2 in myList on <your equality conditions>
I found this as the easiest solution.
public static IEnumerable<TSource> DistinctBy<TSource, TKey>
(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
return source.GroupBy(keySelector).Select(x => x.FirstOrDefault());
}

Categories

Resources