How to implement my own operator in rx.net - c#

I need the functionality of a hysteresis filter in RX. It should emit a value from the source stream only when the previously emitted value and the current input value differ by a certain amount. As a generic extension method, it could have the following signature:
public static IObservable<T> HysteresisFilter<T>(this IObservable<t> source, Func<T/*previously emitted*/, T/*current*/, bool> filter)
I was not able to figure out how to implement this with existing operators. I was looking for something like lift from RxJava, any other method to create my own operator. I have seen this checklist, but I haven't found any example on the web.
The following approaches (both are actually the same) which seem workaround to me work, but is there a more Rx way to do this, like without wrapping a subject or actually implementing an operator?
async Task Main()
{
var cts = new CancellationTokenSource(TimeSpan.FromSeconds(5));
var rnd = new Random();
var s = Observable.Interval(TimeSpan.FromMilliseconds(10))
.Scan(0d, (a,_) => a + rnd.NextDouble() - 0.5)
.Publish()
.AutoConnect()
;
s.Subscribe(Console.WriteLine, cts.Token);
s.HysteresisFilter((p, c) => Math.Abs(p - c) > 1d).Subscribe(x => Console.WriteLine($"1> {x}"), cts.Token);
s.HysteresisFilter2((p, c) => Math.Abs(p - c) > 1d).Subscribe(x => Console.WriteLine($"2> {x}"), cts.Token);
await Task.Delay(Timeout.InfiniteTimeSpan, cts.Token).ContinueWith(_=>_, TaskContinuationOptions.OnlyOnCanceled);
}
public static class ReactiveOperators
{
public static IObservable<T> HysteresisFilter<T>(this IObservable<T> source, Func<T, T, bool> filter)
{
return new InternalHysteresisFilter<T>(source, filter).AsObservable;
}
public static IObservable<T> HysteresisFilter2<T>(this IObservable<T> source, Func<T, T, bool> filter)
{
var subject = new Subject<T>();
T lastEmitted = default;
bool emitted = false;
source.Subscribe(
value =>
{
if (!emitted || filter(lastEmitted, value))
{
subject.OnNext(value);
lastEmitted = value;
emitted = true;
}
}
, ex => subject.OnError(ex)
, () => subject.OnCompleted()
);
return subject;
}
private class InternalHysteresisFilter<T>: IObserver<T>
{
Func<T, T, bool> filter;
T lastEmitted;
bool emitted;
private readonly Subject<T> subject = new Subject<T>();
public IObservable<T> AsObservable => subject;
public InternalHysteresisFilter(IObservable<T> source, Func<T, T, bool> filter)
{
this.filter = filter;
source.Subscribe(this);
}
public IDisposable Subscribe(IObserver<T> observer)
{
return subject.Subscribe(observer);
}
public void OnNext(T value)
{
if (!emitted || filter(lastEmitted, value))
{
subject.OnNext(value);
lastEmitted = value;
emitted = true;
}
}
public void OnError(Exception error)
{
subject.OnError(error);
}
public void OnCompleted()
{
subject.OnCompleted();
}
}
}
Sidenote: There will be several thousand of such filters applied to as many streams. I need throughput over latency, thus I am looking for the solution with the minimum of overhead both in CPU and in memory even if others look fancier.

Most examples I've seen in the book Introduction to Rx are using the method Observable.Create for creating new operators.
The Create factory method is the preferred way to implement custom observable sequences. The usage of subjects should largely remain in the realms of samples and testing. (citation)
public static IObservable<T> HysteresisFilter<T>(this IObservable<T> source,
Func<T, T, bool> predicate)
{
return Observable.Create<T>(observer =>
{
T lastEmitted = default;
bool emitted = false;
return source.Subscribe(value =>
{
if (!emitted || predicate(lastEmitted, value))
{
observer.OnNext(value);
lastEmitted = value;
emitted = true;
}
}, observer.OnError, observer.OnCompleted);
});
}

This answer is the same is equivalent to #Theodor's, but it avoids using Observable.Create, which I generally would avoid.
public static IObservable<T> HysteresisFilter2<T>(this IObservable<T> source,
Func<T, T, bool> predicate)
{
return source
.Scan((emitted: default(T), isFirstItem: true, emit: false), (state, newItem) => state.isFirstItem || predicate(state.emitted, newItem)
? (newItem, false, true)
: (state.emitted, false, false)
)
.Where(t => t.emit)
.Select(t => t.emitted);
}
.Scan is what you want to use when you're tracking state across items within an observable.

Related

Creating a collection with a function to obtain the next member

I need to accumulate values into a collection, based on an arbitrary function. Each value is derived from calling a function on the previous value.
My current attempt:
public static T[] Aggregate<T>(this T source, Func<T, T> func)
{
var arr = new List<T> { };
var current = source;
while(current != null)
{
arr.Add(current);
current = func(current);
};
return arr.ToArray();
}
Is there a built-in .Net Framework function to do this?
This operation is usually called Unfold. There's no built-in version but it is implemented in FSharp.Core, so you could wrap that:
public static IEnumerable<T> Unfold<T, TState>(TState init, Func<TState, T> gen)
{
var liftF = new Converter<TState, Microsoft.FSharp.Core.FSharpOption<Tuple<T, TState>>>(x =>
{
var r = gen(x);
if (r == null)
{
return Microsoft.FSharp.Core.FSharpOption<Tuple<T, TState>>.None;
}
else
{
return Microsoft.FSharp.Core.FSharpOption<Tuple<T, TState>>.Some(Tuple.Create(r, x));
}
});
var ff = Microsoft.FSharp.Core.FSharpFunc<TState, Microsoft.FSharp.Core.FSharpOption<Tuple<T, TState>>>.FromConverter(liftF);
return Microsoft.FSharp.Collections.SeqModule.Unfold<TState, T>(ff, init);
}
public static IEnumerable<T> Unfold<T>(T source, Func<T, T> func)
{
return Unfold<T>(source, func);
}
however writing your own version would be simpler:
public static IEnumerable<T> Unfold<T>(T source, Func<T, T> func)
{
T current = source;
while(current != null)
{
yield return current;
current = func(current);
}
}
You are referring to an anamorphism as mentioned here linq-unfold-operator, which is the dual of a catamorphism.
Unfold is the dual of Aggregate. Aggregate exists in the .Net Framework; Unfold does not (for some unknown reason). Hence your confusion.
/// seeds: the initial data to unfold
/// stop: if stop(seed) is True, don't go any further
/// map: transform the seed into the final data
/// next: generate the next seed value from the current seed
public static IEnumerable<R> UnFold<T,R>(this IEnumerable<T> seeds, Predicate<T> stop,
Func<T,R> map, Func<T,IEnumerable<T>> next) {
foreach (var seed in seeds) {
if (!stop(seed)) {
yield return map(seed);
foreach (var val in next(seed).UnFold(stop, map, next))
yield return val;
}
}
}
Usage Example:
var parents = new[]{someType}.UnFold(t => t == null, t => t,
t => t.GetInterfaces().Concat(new[]{t.BaseType}))
.Distinct();

Rx – Distinct with timeout?

I’m wondering is there any way to implement Distinct in Reactive Extensions for .NET in such way that it will be working for given time and after this time it should reset and allow values that are come back again. I need this for hot source in application that will be working for whole year with now stops so I’m worried about performance and I need those values to be allowed after some time. There is also DistinctUntilChanged but in my case values could be mixed – for example: A A X A, DistinctUntilChanged will give me A X A and I need result A X and after given time distinct should be reset.
The accepted answer is flawed; flaw demonstrated below. Here's a demonstration of solution, with a test batch:
TestScheduler ts = new TestScheduler();
var source = ts.CreateHotObservable<char>(
new Recorded<Notification<char>>(200.MsTicks(), Notification.CreateOnNext('A')),
new Recorded<Notification<char>>(300.MsTicks(), Notification.CreateOnNext('B')),
new Recorded<Notification<char>>(400.MsTicks(), Notification.CreateOnNext('A')),
new Recorded<Notification<char>>(500.MsTicks(), Notification.CreateOnNext('A')),
new Recorded<Notification<char>>(510.MsTicks(), Notification.CreateOnNext('C')),
new Recorded<Notification<char>>(550.MsTicks(), Notification.CreateOnNext('B')),
new Recorded<Notification<char>>(610.MsTicks(), Notification.CreateOnNext('B'))
);
var target = source.TimedDistinct(TimeSpan.FromMilliseconds(300), ts);
var expectedResults = ts.CreateHotObservable<char>(
new Recorded<Notification<char>>(200.MsTicks(), Notification.CreateOnNext('A')),
new Recorded<Notification<char>>(300.MsTicks(), Notification.CreateOnNext('B')),
new Recorded<Notification<char>>(500.MsTicks(), Notification.CreateOnNext('A')),
new Recorded<Notification<char>>(510.MsTicks(), Notification.CreateOnNext('C')),
new Recorded<Notification<char>>(610.MsTicks(), Notification.CreateOnNext('B'))
);
var observer = ts.CreateObserver<char>();
target.Subscribe(observer);
ts.Start();
ReactiveAssert.AreElementsEqual(expectedResults.Messages, observer.Messages);
Solution includes a number of overloads for TimedDistinct, allowing for IScheduler injection, as well as IEqualityComparer<T> injection, similar to Distinct. Ignoring all those overloads, the solution rests on a helper method StateWhere, which is kind of like a combination of Scan and Where: It filters like a Where, but allows you to embed state in it like Scan.
public static class RxState
{
public static IObservable<TSource> TimedDistinct<TSource>(this IObservable<TSource> source, TimeSpan expirationTime)
{
return TimedDistinct(source, expirationTime, Scheduler.Default);
}
public static IObservable<TSource> TimedDistinct<TSource>(this IObservable<TSource> source, TimeSpan expirationTime, IScheduler scheduler)
{
return TimedDistinct<TSource>(source, expirationTime, EqualityComparer<TSource>.Default, scheduler);
}
public static IObservable<TSource> TimedDistinct<TSource>(this IObservable<TSource> source, TimeSpan expirationTime, IEqualityComparer<TSource> comparer)
{
return TimedDistinct(source, expirationTime, comparer, Scheduler.Default);
}
public static IObservable<TSource> TimedDistinct<TSource>(this IObservable<TSource> source, TimeSpan expirationTime, IEqualityComparer<TSource> comparer, IScheduler scheduler)
{
var toReturn = source
.Timestamp(scheduler)
.StateWhere(
new Dictionary<TSource, DateTimeOffset>(comparer),
(state, item) => item.Value,
(state, item) => state
.Where(kvp => item.Timestamp - kvp.Value < expirationTime)
.Concat(
!state.ContainsKey(item.Value) || item.Timestamp - state[item.Value] >= expirationTime
? Enumerable.Repeat(new KeyValuePair<TSource, DateTimeOffset>(item.Value, item.Timestamp), 1)
: Enumerable.Empty<KeyValuePair<TSource, DateTimeOffset>>()
)
.ToDictionary(kvp => kvp.Key, kvp => kvp.Value, comparer),
(state, item) => !state.ContainsKey(item.Value) || item.Timestamp - state[item.Value] >= expirationTime
);
return toReturn;
}
public static IObservable<TResult> StateSelectMany<TSource, TState, TResult>(
this IObservable<TSource> source,
TState initialState,
Func<TState, TSource, IObservable<TResult>> resultSelector,
Func<TState, TSource, TState> stateSelector
)
{
return source
.Scan(Tuple.Create(initialState, Observable.Empty<TResult>()), (state, item) => Tuple.Create(stateSelector(state.Item1, item), resultSelector(state.Item1, item)))
.SelectMany(t => t.Item2);
}
public static IObservable<TResult> StateWhere<TSource, TState, TResult>(
this IObservable<TSource> source,
TState initialState,
Func<TState, TSource, TResult> resultSelector,
Func<TState, TSource, TState> stateSelector,
Func<TState, TSource, bool> filter
)
{
return source
.StateSelectMany(initialState, (state, item) =>
filter(state, item) ? Observable.Return(resultSelector(state, item)) : Observable.Empty<TResult>(),
stateSelector);
}
}
The accepted answer has two flaws:
It doesn't accept IScheduler injection, meaning that it is hard to test within the Rx testing framework. This is easy to fix.
It relies on mutable state, which doesn't work well in a multi-threaded framework like Rx.
Issue #2 is noticeable with multiple subscribers:
var observable = Observable.Range(0, 5)
.DistinctFor(TimeSpan.MaxValue)
;
observable.Subscribe(i => Console.WriteLine(i));
observable.Subscribe(i => Console.WriteLine(i));
The output, following regular Rx behavior, should be outputting 0-4 twice. Instead, 0-4 is outputted just once.
Here's another sample flaw:
var subject = new Subject<int>();
var observable = subject
.DistinctFor(TimeSpan.MaxValue);
observable.Subscribe(i => Console.WriteLine(i));
observable.Subscribe(i => Console.WriteLine(i));
subject.OnNext(1);
subject.OnNext(2);
subject.OnNext(3);
This outputs 1 2 3 once, not twice.
Here's the code for MsTicks:
public static class RxTestingHelpers
{
public static long MsTicks(this int ms)
{
return TimeSpan.FromMilliseconds(ms).Ticks;
}
}
With a wrapper class that timestamps items, but does not consider the timestamp (created field) for hashing or equality:
public class DistinctForItem<T> : IEquatable<DistinctForItem<T>>
{
private readonly T item;
private DateTime created;
public DistinctForItem(T item)
{
this.item = item;
this.created = DateTime.UtcNow;
}
public T Item
{
get { return item; }
}
public DateTime Created
{
get { return created; }
}
public bool Equals(DistinctForItem<T> other)
{
if (ReferenceEquals(null, other)) return false;
if (ReferenceEquals(this, other)) return true;
return EqualityComparer<T>.Default.Equals(Item, other.Item);
}
public override bool Equals(object obj)
{
if (ReferenceEquals(null, obj)) return false;
if (ReferenceEquals(this, obj)) return true;
if (obj.GetType() != this.GetType()) return false;
return Equals((DistinctForItem<T>)obj);
}
public override int GetHashCode()
{
return EqualityComparer<T>.Default.GetHashCode(Item);
}
public static bool operator ==(DistinctForItem<T> left, DistinctForItem<T> right)
{
return Equals(left, right);
}
public static bool operator !=(DistinctForItem<T> left, DistinctForItem<T> right)
{
return !Equals(left, right);
}
}
It is now possible to write a DistinctFor extension method:
public static IObservable<T> DistinctFor<T>(this IObservable<T> src,
TimeSpan validityPeriod)
{
//if HashSet<DistinctForItem<T>> actually allowed us the get at the
//items it contains it would be a better choice.
//However it doesn't, so we resort to
//Dictionary<DistinctForItem<T>, DistinctForItem<T>> instead.
var hs = new Dictionary<DistinctForItem<T>, DistinctForItem<T>>();
return src.Select(item => new DistinctForItem<T>(item)).Where(df =>
{
DistinctForItem<T> hsVal;
if (hs.TryGetValue(df, out hsVal))
{
var age = DateTime.UtcNow - hsVal.Created;
if (age < validityPeriod)
{
return false;
}
}
hs[df] = df;
return true;
}).Select(df => df.Item);
}
Which can be used:
Enumerable.Range(0, 1000)
.Select(i => i % 3)
.ToObservable()
.Pace(TimeSpan.FromMilliseconds(500)) //drip feeds the observable
.DistinctFor(TimeSpan.FromSeconds(5))
.Subscribe(x => Console.WriteLine(x));
For reference, here is my implementation of Pace<T>:
public static IObservable<T> Pace<T>(this IObservable<T> src, TimeSpan delay)
{
var timer = Observable
.Timer(
TimeSpan.FromSeconds(0),
delay
);
return src.Zip(timer, (s, t) => s);
}

Lazily partition sequence with LINQ

I have the following extension method to find an element within a sequence, and then return two IEnumerable<T>s: one containing all the elements before that element, and one containing the element and everything that follows. I would prefer if the method were lazy, but I haven't figured out a way to do that. Can anyone come up with a solution?
public static PartitionTuple<T> Partition<T>(this IEnumerable<T> sequence, Func<T, bool> partition)
{
var a = sequence.ToArray();
return new PartitionTuple<T>
{
Before = a.TakeWhile(v => !partition(v)),
After = a.SkipWhile(v => !partition(v))
};
}
Doing sequence.ToArray() immediately defeats the laziness requirement. However, without that line, an expensive-to-iterate sequence may be iterated over twice. And, depending on what the calling code does, many more times.
You can use the Lazy object to ensure that the source sequence isn't converted to an array until one of the two partitions is iterated:
public static PartitionTuple<T> Partition<T>(
this IEnumerable<T> sequence, Func<T, bool> partition)
{
var lazy = new Lazy<IEnumerable<T>>(() => sequence.ToArray());
return new PartitionTuple<T>
{
Before = lazy.MapLazySequence(s => s.TakeWhile(v => !partition(v))),
After = lazy.MapLazySequence(s => s.SkipWhile(v => !partition(v)))
};
}
We'll use this method to defer evaluating the lazy until the sequence itself is iterated:
public static IEnumerable<TResult> MapLazySequence<TSource, TResult>(
this Lazy<IEnumerable<TSource>> lazy,
Func<IEnumerable<TSource>, IEnumerable<TResult>> filter)
{
foreach (var item in filter(lazy.Value))
yield return item;
}
This is an interesting problem and to get it right, you have to know what "right" is. For the semantics of the operation, I think that this definition makes sense:
The source sequence is only enumerated once even though the resulting sequences are enumerated several times.
The source sequence isn't enumerated until one of the results is enumerated.
Each of the results should be possible to enumerate independently.
If the source sequence changes, it is undefined what will happen.
I'm not entirely sure I got the handling of the matching object right, but I hope you get the idea. I'm deferring a lot of the work to the PartitionTuple<T> class to be able to be lazy.
public class PartitionTuple<T>
{
IEnumerable<T> source;
IList<T> before, after;
Func<T, bool> partition;
public PartitionTuple(IEnumerable<T> source, Func<T, bool> partition)
{
this.source = source;
this.partition = partition;
}
private void EnsureMaterialized()
{
if(before == null)
{
before = new List<T>();
after = new List<T>();
using(var enumerator = source.GetEnumerator())
{
while(enumerator.MoveNext() && !partition(enumerator.Current))
{
before.Add(enumerator.Current);
}
while(!partition(enumerator.Current) && enumerator.MoveNext());
while(enumerator.MoveNext())
{
after.Add(enumerator.Current);
}
}
}
}
public IEnumerable<T> Before
{
get
{
EnsureMaterialized();
return before;
}
}
public IEnumerable<T> After
{
get
{
EnsureMaterialized();
return after;
}
}
}
public static class Extensions
{
public static PartitionTuple<T> Partition<T>(this IEnumerable<T> sequence, Func<T, bool> partition)
{
return new PartitionTuple<T>(sequence, partition);
}
}
Here's a generic solution that will memoize any IEnumerable<T> to ensure it's only iterated once, without forcing the whole thing to iterate:
public class MemoizedEnumerable<T> : IEnumerable<T>, IDisposable
{
private readonly IEnumerator<T> _childEnumerator;
private readonly List<T> _itemCache = new List<T>();
public MemoizedEnumerable(IEnumerable<T> enumerableToMemoize)
{
_childEnumerator = enumerableToMemoize.GetEnumerator();
}
public IEnumerator<T> GetEnumerator()
{
return _itemCache.Concat(EnumerateOnce()).GetEnumerator();
}
public void Dispose()
{
_childEnumerator.Dispose();
}
private IEnumerable<T> EnumerateOnce()
{
while (_childEnumerator.MoveNext())
{
_itemCache.Add(_childEnumerator.Current);
yield return _childEnumerator.Current;
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
public static class EnumerableExtensions
{
public static IEnumerable<T> Memoize<T>(this IEnumerable<T> enumerable)
{
return new MemoizedEnumerable<T>(enumerable);
}
}
To use it for your partitioning problem, do this:
var memoized = sequence.Memoize();
return new PartitionTuple<T>
{
Before = memoized.TakeWhile(v => !partition(v)),
After = memoized.SkipWhile(v => !partition(v))
};
This will only iterate sequence a maximum of one time.
Generally, you just return some object of your custom class, which implements IEnumerable<T> but also provides the results on enumeration demand only.
You can also implement IQueryable<T> (inherits IEnumerable) instead of IEnumerable<T>, but it's rather needed for building reach functionality with queries like the one, which linq for sql provides: database query being executed only on the final enumeration request.

Using IEqualityComparer for Union

I simply want to remove duplicates from two lists and combine them into one list. I also need to be able to define what a duplicate is. I define a duplicate by the ColumnIndex property, if they are the same, they are duplicates. Here is the approach I took:
I found a nifty example of how to write inline comparers for the random occassions where you need em only once in a code segment.
public class InlineComparer<T> : IEqualityComparer<T>
{
private readonly Func<T, T, bool> getEquals;
private readonly Func<T, int> getHashCode;
public InlineComparer(Func<T, T, bool> equals, Func<T, int> hashCode)
{
getEquals = equals;
getHashCode = hashCode;
}
public bool Equals(T x, T y)
{
return getEquals(x, y);
}
public int GetHashCode(T obj)
{
return getHashCode(obj);
}
}
Then I just have my two lists, and attempt a union on them with the comparer.
var formatIssues = issues.Where(i => i.IsFormatError == true);
var groupIssues = issues.Where(i => i.IsGroupError == true);
var dupComparer = new InlineComparer<Issue>((i1, i2) => i1.ColumnInfo.ColumnIndex == i2.ColumnInfo.ColumnIndex,
i => i.ColumnInfo.ColumnIndex);
var filteredIssues = groupIssues.Union(formatIssues, dupComparer);
The result set however is null.
Where am I going astray?
I have already confirmed that the two lists have columns with equal ColumnIndex properties.
I've just run your code on a test set.... and it works!
public class InlineComparer<T> : IEqualityComparer<T>
{
private readonly Func<T, T, bool> getEquals;
private readonly Func<T, int> getHashCode;
public InlineComparer(Func<T, T, bool> equals, Func<T, int> hashCode)
{
getEquals = equals;
getHashCode = hashCode;
}
public bool Equals(T x, T y)
{
return getEquals(x, y);
}
public int GetHashCode(T obj)
{
return getHashCode(obj);
}
}
class TestClass
{
public string S { get; set; }
}
[TestMethod]
public void testThis()
{
var l1 = new List<TestClass>()
{
new TestClass() {S = "one"},
new TestClass() {S = "two"},
};
var l2 = new List<TestClass>()
{
new TestClass() {S = "three"},
new TestClass() {S = "two"},
};
var dupComparer = new InlineComparer<TestClass>((i1, i2) => i1.S == i2.S, i => i.S.GetHashCode());
var unionList = l1.Union(l2, dupComparer);
Assert.AreEqual(3, unionList);
}
So... maybe go back and check your test data - or run it with some other test data?
After all - for a Union to be empty - that suggests that both your input lists are also empty?
A slightly simpler way:
it does preserve the original order
it ignores dupes as it finds them
Uses a link extension method:
formatIssues.Union(groupIssues).DistinctBy(x => x.ColumnIndex)
This is the DistinctBy lambda method from MoreLinq
public static IEnumerable<TSource> DistinctBy<TSource, TKey>
(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
HashSet<TKey> knownKeys = new HashSet<TKey>();
foreach (TSource element in source)
{
if (knownKeys.Add(keySelector(element)))
{
yield return element;
}
}
}
Would the Linq Except method not do it for you?
var formatIssues = issues.Where(i => i.IsFormatError == true);
var groupIssues = issues.Where(i => i.IsGroupError == true);
var dupeIssues = issues.Where(i => issues.Except(new List<Issue> {i})
.Any(x => x.ColumnIndex == i.ColumnIndex));
var filteredIssues = formatIssues.Union(groupIssues).Except(dupeIssues);

Distinct() with lambda?

Right, so I have an enumerable and wish to get distinct values from it.
Using System.Linq, there's, of course, an extension method called Distinct. In the simple case, it can be used with no parameters, like:
var distinctValues = myStringList.Distinct();
Well and good, but if I have an enumerable of objects for which I need to specify equality, the only available overload is:
var distinctValues = myCustomerList.Distinct(someEqualityComparer);
The equality comparer argument must be an instance of IEqualityComparer<T>. I can do this, of course, but it's somewhat verbose and, well, cludgy.
What I would have expected is an overload that would take a lambda, say a Func<T, T, bool>:
var distinctValues = myCustomerList.Distinct((c1, c2) => c1.CustomerId == c2.CustomerId);
Anyone know if some such extension exists, or some equivalent workaround? Or am I missing something?
Alternatively, is there a way of specifying an IEqualityComparer inline (embarrass me)?
Update
I found a reply by Anders Hejlsberg to a post in an MSDN forum on this subject. He says:
The problem you're going to run into is that when two objects compare
equal they must have the same GetHashCode return value (or else the
hash table used internally by Distinct will not function correctly).
We use IEqualityComparer because it packages compatible
implementations of Equals and GetHashCode into a single interface.
I suppose that makes sense.
IEnumerable<Customer> filteredList = originalList
.GroupBy(customer => customer.CustomerId)
.Select(group => group.First());
It looks to me like you want DistinctBy from MoreLINQ. You can then write:
var distinctValues = myCustomerList.DistinctBy(c => c.CustomerId);
Here's a cut-down version of DistinctBy (no nullity checking and no option to specify your own key comparer):
public static IEnumerable<TSource> DistinctBy<TSource, TKey>
(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
HashSet<TKey> knownKeys = new HashSet<TKey>();
foreach (TSource element in source)
{
if (knownKeys.Add(keySelector(element)))
{
yield return element;
}
}
}
To Wrap things up . I think most of the people which came here like me want the simplest solution possible without using any libraries and with best possible performance.
(The accepted group by method for me i think is an overkill in terms of performance. )
Here is a simple extension method using the IEqualityComparer interface which works also for null values.
Usage:
var filtered = taskList.DistinctBy(t => t.TaskExternalId).ToArray();
Extension Method Code
public static class LinqExtensions
{
public static IEnumerable<T> DistinctBy<T, TKey>(this IEnumerable<T> items, Func<T, TKey> property)
{
GeneralPropertyComparer<T, TKey> comparer = new GeneralPropertyComparer<T,TKey>(property);
return items.Distinct(comparer);
}
}
public class GeneralPropertyComparer<T,TKey> : IEqualityComparer<T>
{
private Func<T, TKey> expr { get; set; }
public GeneralPropertyComparer (Func<T, TKey> expr)
{
this.expr = expr;
}
public bool Equals(T left, T right)
{
var leftProp = expr.Invoke(left);
var rightProp = expr.Invoke(right);
if (leftProp == null && rightProp == null)
return true;
else if (leftProp == null ^ rightProp == null)
return false;
else
return leftProp.Equals(rightProp);
}
public int GetHashCode(T obj)
{
var prop = expr.Invoke(obj);
return (prop==null)? 0:prop.GetHashCode();
}
}
Shorthand solution
myCustomerList.GroupBy(c => c.CustomerId, (key, c) => c.FirstOrDefault());
No there is no such extension method overload for this. I've found this frustrating myself in the past and as such I usually write a helper class to deal with this problem. The goal is to convert a Func<T,T,bool> to IEqualityComparer<T,T>.
Example
public class EqualityFactory {
private sealed class Impl<T> : IEqualityComparer<T,T> {
private Func<T,T,bool> m_del;
private IEqualityComparer<T> m_comp;
public Impl(Func<T,T,bool> del) {
m_del = del;
m_comp = EqualityComparer<T>.Default;
}
public bool Equals(T left, T right) {
return m_del(left, right);
}
public int GetHashCode(T value) {
return m_comp.GetHashCode(value);
}
}
public static IEqualityComparer<T,T> Create<T>(Func<T,T,bool> del) {
return new Impl<T>(del);
}
}
This allows you to write the following
var distinctValues = myCustomerList
.Distinct(EqualityFactory.Create((c1, c2) => c1.CustomerId == c2.CustomerId));
Here's a simple extension method that does what I need...
public static class EnumerableExtensions
{
public static IEnumerable<TKey> Distinct<T, TKey>(this IEnumerable<T> source, Func<T, TKey> selector)
{
return source.GroupBy(selector).Select(x => x.Key);
}
}
It's a shame they didn't bake a distinct method like this into the framework, but hey ho.
This will do what you want but I don't know about performance:
var distinctValues =
from cust in myCustomerList
group cust by cust.CustomerId
into gcust
select gcust.First();
At least it's not verbose.
From .NET 6 or later, there is a new build-in method Enumerable.DistinctBy to achieve this.
var distinctValues = myCustomerList.DistinctBy(c => c.CustomerId);
// With IEqualityComparer
var distinctValues = myCustomerList.DistinctBy(c => c.CustomerId, someEqualityComparer);
Something I have used which worked well for me.
/// <summary>
/// A class to wrap the IEqualityComparer interface into matching functions for simple implementation
/// </summary>
/// <typeparam name="T">The type of object to be compared</typeparam>
public class MyIEqualityComparer<T> : IEqualityComparer<T>
{
/// <summary>
/// Create a new comparer based on the given Equals and GetHashCode methods
/// </summary>
/// <param name="equals">The method to compute equals of two T instances</param>
/// <param name="getHashCode">The method to compute a hashcode for a T instance</param>
public MyIEqualityComparer(Func<T, T, bool> equals, Func<T, int> getHashCode)
{
if (equals == null)
throw new ArgumentNullException("equals", "Equals parameter is required for all MyIEqualityComparer instances");
EqualsMethod = equals;
GetHashCodeMethod = getHashCode;
}
/// <summary>
/// Gets the method used to compute equals
/// </summary>
public Func<T, T, bool> EqualsMethod { get; private set; }
/// <summary>
/// Gets the method used to compute a hash code
/// </summary>
public Func<T, int> GetHashCodeMethod { get; private set; }
bool IEqualityComparer<T>.Equals(T x, T y)
{
return EqualsMethod(x, y);
}
int IEqualityComparer<T>.GetHashCode(T obj)
{
if (GetHashCodeMethod == null)
return obj.GetHashCode();
return GetHashCodeMethod(obj);
}
}
All solutions I've seen here rely on selecting an already comparable field. If one needs to compare in a different way, though, this solution here seems to work generally, for something like:
somedoubles.Distinct(new LambdaComparer<double>((x, y) => Math.Abs(x - y) < double.Epsilon)).Count()
Take another way:
var distinctValues = myCustomerList.
Select(x => x._myCaustomerProperty).Distinct();
The sequence return distinct elements compare them by property '_myCaustomerProperty' .
You can use LambdaEqualityComparer:
var distinctValues
= myCustomerList.Distinct(new LambdaEqualityComparer<OurType>((c1, c2) => c1.CustomerId == c2.CustomerId));
public class LambdaEqualityComparer<T> : IEqualityComparer<T>
{
public LambdaEqualityComparer(Func<T, T, bool> equalsFunction)
{
_equalsFunction = equalsFunction;
}
public bool Equals(T x, T y)
{
return _equalsFunction(x, y);
}
public int GetHashCode(T obj)
{
return obj.GetHashCode();
}
private readonly Func<T, T, bool> _equalsFunction;
}
You can use InlineComparer
public class InlineComparer<T> : IEqualityComparer<T>
{
//private readonly Func<T, T, bool> equalsMethod;
//private readonly Func<T, int> getHashCodeMethod;
public Func<T, T, bool> EqualsMethod { get; private set; }
public Func<T, int> GetHashCodeMethod { get; private set; }
public InlineComparer(Func<T, T, bool> equals, Func<T, int> hashCode)
{
if (equals == null) throw new ArgumentNullException("equals", "Equals parameter is required for all InlineComparer instances");
EqualsMethod = equals;
GetHashCodeMethod = hashCode;
}
public bool Equals(T x, T y)
{
return EqualsMethod(x, y);
}
public int GetHashCode(T obj)
{
if (GetHashCodeMethod == null) return obj.GetHashCode();
return GetHashCodeMethod(obj);
}
}
Usage sample:
var comparer = new InlineComparer<DetalleLog>((i1, i2) => i1.PeticionEV == i2.PeticionEV && i1.Etiqueta == i2.Etiqueta, i => i.PeticionEV.GetHashCode() + i.Etiqueta.GetHashCode());
var peticionesEV = listaLogs.Distinct(comparer).ToList();
Assert.IsNotNull(peticionesEV);
Assert.AreNotEqual(0, peticionesEV.Count);
Source:
https://stackoverflow.com/a/5969691/206730
Using IEqualityComparer for Union
Can I specify my explicit type comparator inline?
If Distinct() doesn't produce unique results, try this one:
var filteredWC = tblWorkCenter.GroupBy(cc => cc.WCID_I).Select(grp => grp.First()).Select(cc => new Model.WorkCenter { WCID = cc.WCID_I }).OrderBy(cc => cc.WCID);
ObservableCollection<Model.WorkCenter> WorkCenter = new ObservableCollection<Model.WorkCenter>(filteredWC);
A tricky way to do this is use Aggregate() extension, using a dictionary as accumulator with the key-property values as keys:
var customers = new List<Customer>();
var distincts = customers.Aggregate(new Dictionary<int, Customer>(),
(d, e) => { d[e.CustomerId] = e; return d; },
d => d.Values);
And a GroupBy-style solution is using ToLookup():
var distincts = customers.ToLookup(c => c.CustomerId).Select(g => g.First());
IEnumerable lambda extension:
public static class ListExtensions
{
public static IEnumerable<T> Distinct<T>(this IEnumerable<T> list, Func<T, int> hashCode)
{
Dictionary<int, T> hashCodeDic = new Dictionary<int, T>();
list.ToList().ForEach(t =>
{
var key = hashCode(t);
if (!hashCodeDic.ContainsKey(key))
hashCodeDic.Add(key, t);
});
return hashCodeDic.Select(kvp => kvp.Value);
}
}
Usage:
class Employee
{
public string Name { get; set; }
public int EmployeeID { get; set; }
}
//Add 5 employees to List
List<Employee> lst = new List<Employee>();
Employee e = new Employee { Name = "Shantanu", EmployeeID = 123456 };
lst.Add(e);
lst.Add(e);
Employee e1 = new Employee { Name = "Adam Warren", EmployeeID = 823456 };
lst.Add(e1);
//Add a space in the Name
Employee e2 = new Employee { Name = "Adam Warren", EmployeeID = 823456 };
lst.Add(e2);
//Name is different case
Employee e3 = new Employee { Name = "adam warren", EmployeeID = 823456 };
lst.Add(e3);
//Distinct (without IEqalityComparer<T>) - Returns 4 employees
var lstDistinct1 = lst.Distinct();
//Lambda Extension - Return 2 employees
var lstDistinct = lst.Distinct(employee => employee.EmployeeID.GetHashCode() ^ employee.Name.ToUpper().Replace(" ", "").GetHashCode());
The Microsoft System.Interactive package has a version of Distinct that takes a key selector lambda. This is effectively the same as Jon Skeet's solution, but it may be helpful for people to know, and to check out the rest of the library.
Here's how you can do it:
public static class Extensions
{
public static IEnumerable<T> MyDistinct<T, V>(this IEnumerable<T> query,
Func<T, V> f,
Func<IGrouping<V,T>,T> h=null)
{
if (h==null) h=(x => x.First());
return query.GroupBy(f).Select(h);
}
}
This method allows you to use it by specifying one parameter like .MyDistinct(d => d.Name), but it also allows you to specify a having condition as a second parameter like so:
var myQuery = (from x in _myObject select x).MyDistinct(d => d.Name,
x => x.FirstOrDefault(y=>y.Name.Contains("1") || y.Name.Contains("2"))
);
N.B. This would also allow you to specify other functions like for example .LastOrDefault(...) as well.
If you want to expose just the condition, you can have it even simpler by implementing it as:
public static IEnumerable<T> MyDistinct2<T, V>(this IEnumerable<T> query,
Func<T, V> f,
Func<T,bool> h=null
)
{
if (h == null) h = (y => true);
return query.GroupBy(f).Select(x=>x.FirstOrDefault(h));
}
In this case, the query would just look like:
var myQuery2 = (from x in _myObject select x).MyDistinct2(d => d.Name,
y => y.Name.Contains("1") || y.Name.Contains("2")
);
N.B. Here, the expression is simpler, but note .MyDistinct2 uses .FirstOrDefault(...) implicitly.
Note: The examples above are using the following demo class
class MyObject
{
public string Name;
public string Code;
}
private MyObject[] _myObject = {
new MyObject() { Name = "Test1", Code = "T"},
new MyObject() { Name = "Test2", Code = "Q"},
new MyObject() { Name = "Test2", Code = "T"},
new MyObject() { Name = "Test5", Code = "Q"}
};
I'm assuming you have an IEnumerable<T>, and in your example delegate, you would like c1 and c2 to be referring to two elements in this list?
I believe you could achieve this with a self join:
var distinctResults = from c1 in myList
join c2 in myList on <your equality conditions>
I found this as the easiest solution.
public static IEnumerable<TSource> DistinctBy<TSource, TKey>
(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
return source.GroupBy(keySelector).Select(x => x.FirstOrDefault());
}

Categories

Resources