Chaining fluent LINQ-like queries together - c#

I wanted to build a fluent api to iterate on an array where I filter values and continue processing the remaining (not the filtered ones) values. Something like this pseudo-code:
int[] input = { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
from a in Take(3) // a = {5,4,1}
from b in Skip(4) // b = null
from c in TakeWhile(x=> x != 0) // c = {7, 2}
select new Stuff(a, b, c)
I don't know where to start looking, what are the basis for something like this. So I wanted to ask for some help.
The system should not be restricted to int numbers.. another example:
string[] input = { "how", "are", "you", "doing", "?" };
from a in OneOf("how", "what", "where") // a = "how"
from b in Match("are") // b = "are"
from c in TakeWhile(x=> x != "?") // c = { "you", "doing" }
select new Stuff(a, b, c)

The following code will allow you to do input.FirstTake(3).ThenSkip(4).ThenTakeWhile(x => x != 0); to get the sequence 5, 4, 1, 7, 2. The main idea is that you need to keep track of the takes and skips you want to do so they can be applied when you iterate. This is similar to how OrderBy and ThenBy work. Note that you cannot do other Linq operations in between. This build up one enumeration of consecutive skips and takes, then that sequence will be fed through any Linq operations you tack on.
public interface ITakeAndSkip<out T> : IEnumerable<T>
{
ITakeAndSkip<T> ThenSkip(int number);
ITakeAndSkip<T> ThenTake(int number);
ITakeAndSkip<T> ThenTakeWhile(Func<T, bool> predicate);
ITakeAndSkip<T> ThenSkipWhile(Func<T, bool> predicate);
}
public class TakeAndSkip<T> : ITakeAndSkip<T>
{
private readonly IEnumerable<T> _source;
private class TakeOrSkipOperation
{
public bool IsSkip { get; private set; }
public Func<T, bool> Predicate { get; private set; }
public int Number { get; private set; }
private TakeOrSkipOperation()
{
}
public static TakeOrSkipOperation Skip(int number)
{
return new TakeOrSkipOperation
{
IsSkip = true,
Number = number
};
}
public static TakeOrSkipOperation Take(int number)
{
return new TakeOrSkipOperation
{
Number = number
};
}
public static TakeOrSkipOperation SkipWhile(Func<T, bool> predicate)
{
return new TakeOrSkipOperation
{
IsSkip = true,
Predicate = predicate
};
}
public static TakeOrSkipOperation TakeWhile(Func<T, bool> predicate)
{
return new TakeOrSkipOperation
{
Predicate = predicate
};
}
}
private readonly List<TakeOrSkipOperation> _operations = new List<TakeOrSkipOperation>();
public TakeAndSkip(IEnumerable<T> source)
{
_source = source;
}
public IEnumerator<T> GetEnumerator()
{
using (var enumerator = _source.GetEnumerator())
{
// move to the first item and if there are none just return
if (!enumerator.MoveNext()) yield break;
// Then apply all the skip and take operations
foreach (var operation in _operations)
{
int n = operation.Number;
// If we are not dealing with a while then make the predicate count
// down the number to zero.
var predicate = operation.Predicate ?? (x => n-- > 0);
// Iterate the items until there are no more or the predicate is false
bool more = true;
while (more && predicate(enumerator.Current))
{
// If this is a Take then yield the current item.
if (!operation.IsSkip) yield return enumerator.Current;
more = enumerator.MoveNext();
}
// If there are no more items return
if (!more) yield break;
}
// Now we need to decide what to do with the rest of the items.
// If there are no operations or the last one was a skip then
// return the remaining items
if (_operations.Count == 0 || _operations.Last().IsSkip)
{
do
{
yield return enumerator.Current;
} while (enumerator.MoveNext());
}
// Otherwise the last operation was a take and we're done.
}
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
public ITakeAndSkip<T> ThenSkip(int number)
{
_operations.Add(TakeOrSkipOperation.Skip(number));
return this;
}
public ITakeAndSkip<T> ThenSkipWhile(Func<T, bool> predicate)
{
_operations.Add(TakeOrSkipOperation.SkipWhile(predicate));
return this;
}
public ITakeAndSkip<T> ThenTake(int number)
{
_operations.Add(TakeOrSkipOperation.Take(number));
return this;
}
public ITakeAndSkip<T> ThenTakeWhile(Func<T, bool> predicate)
{
_operations.Add(TakeOrSkipOperation.TakeWhile(predicate));
return this;
}
}
public static class TakeAndSkipExtensions
{
public static ITakeAndSkip<T> FirstTake<T>(this IEnumerable<T> source, int number)
{
return new TakeAndSkip<T>(source).ThenTake(number);
}
public static ITakeAndSkip<T> FirstSkip<T>(this IEnumerable<T> source, int number)
{
return new TakeAndSkip<T>(source).ThenSkip(number);
}
public static ITakeAndSkip<T> FirstTakeWhile<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
return new TakeAndSkip<T>(source).ThenTakeWhile(predicate);
}
public static ITakeAndSkip<T> FirstSkipWhile<T>(this IEnumerable<T> source, Func<T, bool> predicate)
{
return new TakeAndSkip<T>(source).ThenSkipWhile(predicate);
}
}

Related

How to apply contain on last record and delete if found in LINQ?

I have a list of strings like
AAPL,28/03/2012,88.34,88.778,87.187,88.231,163682382
AAPL,29/03/2012,87.54,88.08,86.747,87.123,151551216
FB,30/03/2012,86.967,87.223,85.42,85.65,182255227
Now I want to delete only last record if it does not contains AAPL(symbol name) using LINQ.
Below I have write my code which contains multiple line but I want to make it single line code,
fileLines = System.IO.File.ReadAllLines(fileName).AsParallel().Skip(1).ToList();
var lastLine = fileLines.Last();
if (!lastLine.Contains(item.sym))
{
fileLines.RemoveAt(fileLines.Count - 1);
}
So How can I make all it in single line linq query ?
You could use the ternary operator to decide on the tail to concatenate as follows.
fileLines
= fileLines.Take(fileLines.Count())
.Concat(fileLines.Last().Contains(item.sym) ? Enumerable.Empty
: new string[]{ item.sym });
You could formulate it even more contracted as follows.
fileLines
= System.IO.File.ReadAllLines(fileName)
.AsParallel()
.Skip(1)
.Take(fileLines.Count())
.Concat(fileLines.Last().Contains(item.sym) ? Enumerable.Empty
: new string[]{ item.sym });
.ToList();
That being said, such an endeavour is questionable. The accumulation of lazily evaluated Linq extension methods is difficult to debug.
I understand you need to simplify the filtering operation, and, from what I see in your case, you're missing only one piece of information (i.e whether or not current item is the last one in an enumerated collection) that will help you define your predicate. What I'm about to write now might not seem "a simple single line"; however, it's gonna be a reusable extension that will provide this piece of information (and more) without performing extra and unnecessary loops or iterations.
The final product of that will be:
IEnumerable<string> fileLines = System.IO.File.ReadLines(fileName).RichWhere((item, originalIndex, countedIndex, hasMoreItems) => hasMoreItems || item.StartsWith("AAPL"));
The LINQ-like extension that I wrote inspired by Microsoft's Enumerable at ReferenceSource:
public delegate bool RichPredicate<T>(T item, int originalIndex, int countedIndex, bool hasMoreItems);
public static class EnumerableExtensions
{
/// <remarks>
/// This was contributed by Aly El-Haddad as an answer to this Stackoverflow.com question:
/// https://stackoverflow.com/q/54829095/3602352
/// </remarks>
public static IEnumerable<T> RichWhere<T>(this IEnumerable<T> source, RichPredicate<T> predicate)
{
return new RichWhereIterator<T>(source, predicate);
}
private class RichWhereIterator<T> : IEnumerable<T>, IEnumerator<T>
{
private readonly int threadId;
private readonly IEnumerable<T> source;
private readonly RichPredicate<T> predicate;
private IEnumerator<T> enumerator;
private int state;
private int countedIndex = -1;
private int originalIndex = -1;
private bool hasMoreItems;
public RichWhereIterator(IEnumerable<T> source, RichPredicate<T> predicate)
{
threadId = Thread.CurrentThread.ManagedThreadId;
this.source = source ?? throw new ArgumentNullException(nameof(source));
this.predicate = predicate ?? ((item, originalIndex, countedIndex, hasMoreItems) => true);
}
public T Current { get; private set; }
object IEnumerator.Current => Current;
public void Dispose()
{
if (enumerator is IDisposable disposable)
disposable.Dispose();
enumerator = null;
originalIndex = -1;
countedIndex = -1;
hasMoreItems = false;
Current = default(T);
state = -1;
}
public bool MoveNext()
{
switch (state)
{
case 1:
enumerator = source.GetEnumerator();
if (!(hasMoreItems = enumerator.MoveNext()))
{
Dispose();
break;
}
++originalIndex;
state = 2;
goto case 2;
case 2:
if (!hasMoreItems) //last predicate returned true and that was the last item
{
Dispose();
break;
}
T current = enumerator.Current;
hasMoreItems = enumerator.MoveNext();
++originalIndex;
if (predicate(current, originalIndex - 1, countedIndex + 1, hasMoreItems))
{
++countedIndex;
Current = current;
return true;
}
else if (hasMoreItems)
{ goto case 2; }
//predicate returned false and there're no more items
Dispose();
break;
}
return false;
}
public void Reset()
{
Current = default(T);
hasMoreItems = false;
originalIndex = -1;
countedIndex = -1;
state = 1;
}
public IEnumerator<T> GetEnumerator()
{
if (threadId == Thread.CurrentThread.ManagedThreadId && state == 0)
{
state = 1;
return this;
}
return new RichWhereIterator<T>(source, predicate) { state = 1 };
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
}
RichPredicate<T>, which could be thought of as Func<T, int, int, bool, bool> provide this information about each item:
item: the item to evaluate.
originalIndex: the index of that item in its original IEnumerable<T> source (the one which was directly passed to RichWhere).
countedIndex: the index of that item IF the predicate would evaluate to true.
hasMoreItems: tells whether or not this would be the last item from the original IEnumerable<T> source.

Creating a collection with a function to obtain the next member

I need to accumulate values into a collection, based on an arbitrary function. Each value is derived from calling a function on the previous value.
My current attempt:
public static T[] Aggregate<T>(this T source, Func<T, T> func)
{
var arr = new List<T> { };
var current = source;
while(current != null)
{
arr.Add(current);
current = func(current);
};
return arr.ToArray();
}
Is there a built-in .Net Framework function to do this?
This operation is usually called Unfold. There's no built-in version but it is implemented in FSharp.Core, so you could wrap that:
public static IEnumerable<T> Unfold<T, TState>(TState init, Func<TState, T> gen)
{
var liftF = new Converter<TState, Microsoft.FSharp.Core.FSharpOption<Tuple<T, TState>>>(x =>
{
var r = gen(x);
if (r == null)
{
return Microsoft.FSharp.Core.FSharpOption<Tuple<T, TState>>.None;
}
else
{
return Microsoft.FSharp.Core.FSharpOption<Tuple<T, TState>>.Some(Tuple.Create(r, x));
}
});
var ff = Microsoft.FSharp.Core.FSharpFunc<TState, Microsoft.FSharp.Core.FSharpOption<Tuple<T, TState>>>.FromConverter(liftF);
return Microsoft.FSharp.Collections.SeqModule.Unfold<TState, T>(ff, init);
}
public static IEnumerable<T> Unfold<T>(T source, Func<T, T> func)
{
return Unfold<T>(source, func);
}
however writing your own version would be simpler:
public static IEnumerable<T> Unfold<T>(T source, Func<T, T> func)
{
T current = source;
while(current != null)
{
yield return current;
current = func(current);
}
}
You are referring to an anamorphism as mentioned here linq-unfold-operator, which is the dual of a catamorphism.
Unfold is the dual of Aggregate. Aggregate exists in the .Net Framework; Unfold does not (for some unknown reason). Hence your confusion.
/// seeds: the initial data to unfold
/// stop: if stop(seed) is True, don't go any further
/// map: transform the seed into the final data
/// next: generate the next seed value from the current seed
public static IEnumerable<R> UnFold<T,R>(this IEnumerable<T> seeds, Predicate<T> stop,
Func<T,R> map, Func<T,IEnumerable<T>> next) {
foreach (var seed in seeds) {
if (!stop(seed)) {
yield return map(seed);
foreach (var val in next(seed).UnFold(stop, map, next))
yield return val;
}
}
}
Usage Example:
var parents = new[]{someType}.UnFold(t => t == null, t => t,
t => t.GetInterfaces().Concat(new[]{t.BaseType}))
.Distinct();

StartWith method for arrays

Is there a StartWith method for arrays in .NET? Or something similar to it in LINQ?
var arr1 = { "A", "B, "C" }
var arr2 = { "A", "B, "C", "D" }
var arr3 = { "A", "B, "CD" }
var arr4 = { "E", "A, "B", "C" }
arr2.StartWith(arr1) // true
arr1.StartWith(arr2) // false
arr3.StartWith(arr1) // false
arr4.StartWith(arr1) // false
Or I should do it straightforward:
bool StartWith(string[] arr1, string[] arr2)
{
if (arr1.Count() < arr2.Count) return false;
for (var i = 0; i < arr2.Count(), i++)
{
if (arr2[i] != arr1[i]) return false;
}
return true;
}
I'm looking for the most efficient way to do that.
bool answer = arr2.Take(arr1.Length).SequenceEqual(arr1);
Your "striaghtformward" way is the way most LINQ methods would be doing it anyway. There are a few tweaks you could do. For example make it a extension method and use a comparer for the comparison of the two types so custom comparers could be used.
public static class ExtensionMethods
{
static bool StartWith<T>(this T[] arr1, T[] arr2)
{
return StartWith(arr1, arr2, EqualityComparer<T>.Default);
}
static bool StartWith<T>(this T[] arr1, T[] arr2, IEqualityComparer<T> comparer)
{
if (arr1.Length < arr2.Length) return false;
for (var i = 0; i < arr2.Length, i++)
{
if (!comparer.Equals(arr2[i], arr1[i])) return false;
}
return true;
}
}
UPDATE: For fun I decided to take the time and write a little more "advanced" version that would work with any IEnumerable<T> and not just arrays.
public static class ExtensionMethods
{
static bool StartsWith<T>(this IEnumerable<T> #this, IEnumerable<T> #startsWith)
{
return StartsWith(#this, startsWith, EqualityComparer<T>.Default);
}
static bool StartsWith<T>(this IEnumerable<T> #this, IEnumerable<T> startsWith, IEqualityComparer<T> comparer)
{
if (#this == null) throw new ArgumentNullException("this");
if (startsWith == null) throw new ArgumentNullException("startsWith");
if (comparer == null) throw new ArgumentNullException("comparer");
//Check to see if both types implement ICollection<T> to get a free Count check.
var thisCollection = #this as ICollection<T>;
var startsWithCollection = startsWith as ICollection<T>;
if (thisCollection != null && startsWithCollection != null && (thisCollection.Count < startsWithCollection.Count))
return false;
using (var thisEnumerator = #this.GetEnumerator())
using (var startsWithEnumerator = startsWith.GetEnumerator())
{
//Keep looping till the startsWithEnumerator runs out of items.
while (startsWithEnumerator.MoveNext())
{
//Check to see if the thisEnumerator ran out of items.
if (!thisEnumerator.MoveNext())
return false;
if (!comparer.Equals(thisEnumerator.Current, startsWithEnumerator.Current))
return false;
}
}
return true;
}
}
You can do:
var result = arr2.Take(arr1.Length).SequenceEqual(arr1);
To optimize it further you can add the check arr2.Length >= arr1.Length in the start like:
var result = arr2.Length >= arr1.Length && arr2.Take(arr1.Length).SequenceEqual(arr1);
The end result would be same.
Try Enumerable.SequenceEqual(a1, a2) but trim your first array, i.e.,
var arr1 = { "A", "B, "C" }
var arr2 = { "A", "B, "C", "D" }
if (Enumerable.SequenceEqual(arr1, arr2.Take(arr1.Length))
You don't want to require everything to be an array, and you don't want to call Count() on an IEnumerable<T> that may be a large query, when you only really want to sniff at the first four items or whatever.
public static class Extensions
{
public static void Test()
{
var a = new[] { "a", "b" };
var b = new[] { "a", "b", "c" };
var c = new[] { "a", "b", "c", "d" };
var d = new[] { "x", "y" };
Console.WriteLine("b.StartsWith(a): {0}", b.StartsWith(a));
Console.WriteLine("b.StartsWith(c): {0}", b.StartsWith(c));
Console.WriteLine("b.StartsWith(d, x => x.Length): {0}",
b.StartsWith(d, x => x.Length));
}
public static bool StartsWith<T>(
this IEnumerable<T> sequence,
IEnumerable<T> prefixCandidate,
Func<T, T, bool> compare = null)
{
using (var eseq = sequence.GetEnumerator())
using (var eprefix = prefixCandidate.GetEnumerator())
{
if (compare == null)
{
compare = (x, y) => Object.Equals(x, y);
}
eseq.MoveNext();
eprefix.MoveNext();
do
{
if (!compare(eseq.Current, eprefix.Current))
return false;
if (!eprefix.MoveNext())
return true;
}
while (eseq.MoveNext());
return false;
}
}
public static bool StartsWith<T, TProperty>(
this IEnumerable<T> sequence,
IEnumerable<T> prefixCandidate,
Func<T, TProperty> selector)
{
using (var eseq = sequence.GetEnumerator())
using (var eprefix = prefixCandidate.GetEnumerator())
{
eseq.MoveNext();
eprefix.MoveNext();
do
{
if (!Object.Equals(
selector(eseq.Current),
selector(eprefix.Current)))
{
return false;
}
if (!eprefix.MoveNext())
return true;
}
while (eseq.MoveNext());
return false;
}
}
}
Here are some different ways of doing that. I didn't optimize or fully validated everything, there is room for improvement everywhere. But this should give you some idea.
The best performance will always be going low level, if you grab the iterator and go step by step you can get much faster results.
Methods and performance results:
StartsWith1 00:00:01.9014586
StartsWith2 00:00:02.1227468
StartsWith3 00:00:03.2222109
StartsWith4 00:00:05.5544177
Test method:
var watch = new Stopwatch();
watch.Start();
for (int i = 0; i < 10000000; i++)
{
bool test = action(arr2, arr1);
}
watch.Stop();
return watch.Elapsed;
Methods:
public static class IEnumerableExtender
{
public static bool StartsWith1<T>(this IEnumerable<T> source, IEnumerable<T> compare)
{
if (source.Count() < compare.Count())
{
return false;
}
using (var se = source.GetEnumerator())
{
using (var ce = compare.GetEnumerator())
{
while (ce.MoveNext() && se.MoveNext())
{
if (!ce.Current.Equals(se.Current))
{
return false;
}
}
}
}
return true;
}
public static bool StartsWith2<T>(this IEnumerable<T> source, IEnumerable<T> compare) =>
compare.Take(source.Count()).SequenceEqual(source);
public static bool StartsWith3<T>(this IEnumerable<T> source, IEnumerable<T> compare)
{
if (source == null)
{
throw new ArgumentNullException(nameof(source));
}
if (compare == null)
{
throw new ArgumentNullException(nameof(compare));
}
if (source.Count() < compare.Count())
{
return false;
}
return compare.SequenceEqual(source.Take(compare.Count()));
}
public static bool StartsWith4<T>(this IEnumerable<T> arr1, IEnumerable<T> arr2)
{
return StartsWith4(arr1, arr2, EqualityComparer<T>.Default);
}
public static bool StartsWith4<T>(this IEnumerable<T> arr1, IEnumerable<T> arr2, IEqualityComparer<T> comparer)
{
if (arr1.Count() < arr2.Count()) return false;
for (var i = 0; i < arr2.Count(); i++)
{
if (!comparer.Equals(arr2.ElementAt(i), arr1.ElementAt(i))) return false;
}
return true;
}
}

OutOfMemoryException with big collection and OrderBy?

i have a collection trends with about 30kk elements. When i am trying execute following code in linqpad
trends.Take(count).Dump();
it works ok.
But if i add sort:
trends.OrderByDescending(x => x.Item2).Take(count).Dump();
I get System.OutOfMemoryException
What i am doing wrong?
OrderByDescending (or OrderBy) materializes the whole sequence when you try to fetch the first element - it has to, as otherwise you can't possibly know the first element. It has to make a copy of the sequence (typically just a bunch of references, of course) in order to sort, so if the original sequence is an in-memory collection, you end up with two copies of it. Presumably you don't have enough memory for that.
You don't have to sort the whole collection just take top count elements from it. Here is a solution for this https://codereview.stackexchange.com/a/9777/11651.
The key point from this answer is It doesn't require all items to be kept in memory(for sorting)
Again from comments of the answer in the link:
The idea is: You can find the Max(or Min) item of a List in O(n) time. if you extend this idea to m item(5 in the question), you can get top(or buttom) m items faster then sorting the list(just in one pass on the list + the cost of keeping 5 sorted items)
Here is another extension method that may work better than the original LINQ (e.g. it shouldn't blow up for a small number of selected items). Like L.B.'s solution it should be O(n) and doesn't keep all items in memory:
public static class Enumerables
{
public static IEnumerable<T> TopN<T, TV>(this IEnumerable<T> value, Func<T, TV> selector, Int32 count, IComparer<TV> comparer)
{
var qCount = 0;
var queue = new SortedList<TV, List<T>>(count, comparer);
foreach (var val in value)
{
var currTv = selector(val);
if (qCount >= count && comparer.Compare(currTv, queue.Keys[0]) <= 0) continue;
if (qCount == count)
{
var list = queue.Values[0];
if (list.Count == 1)
queue.RemoveAt(0);
else
list.RemoveAt(0);
qCount--;
}
if (queue.ContainsKey(currTv))
queue[currTv].Add(val);
else
queue.Add(currTv, new List<T> {val});
qCount++;
}
return queue.SelectMany(kvp => kvp.Value);
}
public static IEnumerable<T> TopN<T, TV>(this IEnumerable<T> value, Func<T, TV> selector, Int32 count)
{
return value.TopN(selector, count, Comparer<TV>.Default);
}
public static IEnumerable<T> BottomN<T, TV>(this IEnumerable<T> value, Func<T, TV> selector, Int32 count, IComparer<TV> comparer)
{
return value.TopN(selector, count, new ReverseComparer<TV>(comparer));
}
public static IEnumerable<T> BottomN<T, TV>(this IEnumerable<T> value, Func<T, TV> selector, Int32 count)
{
return value.BottomN(selector, count, Comparer<TV>.Default);
}
}
// Helper class
public class ReverseComparer<T> : IComparer<T>
{
private readonly IComparer<T> _comparer;
public int Compare(T x, T y)
{
return -1*_comparer.Compare(x, y);
}
public ReverseComparer()
: this(Comparer<T>.Default)
{ }
public ReverseComparer(IComparer<T> comparer)
{
if (comparer == null) throw new ArgumentNullException("comparer");
_comparer = comparer;
}
}
And some tests:
[TestFixture]
public class EnumerablesTests
{
[Test]
public void TestTopN()
{
var input = new[] { 1, 2, 8, 3, 6 };
var output = input.TopN(n => n, 3).ToList();
Assert.AreEqual(3, output.Count);
Assert.IsTrue(output.Contains(8));
Assert.IsTrue(output.Contains(6));
Assert.IsTrue(output.Contains(3));
}
[Test]
public void TestBottomN()
{
var input = new[] { 1, 2, 8, 3, 6 };
var output = input.BottomN(n => n, 3).ToList();
Assert.AreEqual(3, output.Count);
Assert.IsTrue(output.Contains(1));
Assert.IsTrue(output.Contains(2));
Assert.IsTrue(output.Contains(3));
}
[Test]
public void TestTopNDupes()
{
var input = new[] { 1, 2, 8, 8, 3, 6 };
var output = input.TopN(n => n, 3).ToList();
Assert.AreEqual(3, output.Count);
Assert.IsTrue(output.Contains(8));
Assert.IsTrue(output.Contains(6));
Assert.IsFalse(output.Contains(3));
}
[Test]
public void TestBottomNDupes()
{
var input = new[] { 1, 1, 2, 8, 3, 6 };
var output = input.BottomN(n => n, 3).ToList();
Assert.AreEqual(3, output.Count);
Assert.IsTrue(output.Contains(1));
Assert.IsTrue(output.Contains(2));
Assert.IsFalse(output.Contains(3));
}
}

Using IEqualityComparer for Union

I simply want to remove duplicates from two lists and combine them into one list. I also need to be able to define what a duplicate is. I define a duplicate by the ColumnIndex property, if they are the same, they are duplicates. Here is the approach I took:
I found a nifty example of how to write inline comparers for the random occassions where you need em only once in a code segment.
public class InlineComparer<T> : IEqualityComparer<T>
{
private readonly Func<T, T, bool> getEquals;
private readonly Func<T, int> getHashCode;
public InlineComparer(Func<T, T, bool> equals, Func<T, int> hashCode)
{
getEquals = equals;
getHashCode = hashCode;
}
public bool Equals(T x, T y)
{
return getEquals(x, y);
}
public int GetHashCode(T obj)
{
return getHashCode(obj);
}
}
Then I just have my two lists, and attempt a union on them with the comparer.
var formatIssues = issues.Where(i => i.IsFormatError == true);
var groupIssues = issues.Where(i => i.IsGroupError == true);
var dupComparer = new InlineComparer<Issue>((i1, i2) => i1.ColumnInfo.ColumnIndex == i2.ColumnInfo.ColumnIndex,
i => i.ColumnInfo.ColumnIndex);
var filteredIssues = groupIssues.Union(formatIssues, dupComparer);
The result set however is null.
Where am I going astray?
I have already confirmed that the two lists have columns with equal ColumnIndex properties.
I've just run your code on a test set.... and it works!
public class InlineComparer<T> : IEqualityComparer<T>
{
private readonly Func<T, T, bool> getEquals;
private readonly Func<T, int> getHashCode;
public InlineComparer(Func<T, T, bool> equals, Func<T, int> hashCode)
{
getEquals = equals;
getHashCode = hashCode;
}
public bool Equals(T x, T y)
{
return getEquals(x, y);
}
public int GetHashCode(T obj)
{
return getHashCode(obj);
}
}
class TestClass
{
public string S { get; set; }
}
[TestMethod]
public void testThis()
{
var l1 = new List<TestClass>()
{
new TestClass() {S = "one"},
new TestClass() {S = "two"},
};
var l2 = new List<TestClass>()
{
new TestClass() {S = "three"},
new TestClass() {S = "two"},
};
var dupComparer = new InlineComparer<TestClass>((i1, i2) => i1.S == i2.S, i => i.S.GetHashCode());
var unionList = l1.Union(l2, dupComparer);
Assert.AreEqual(3, unionList);
}
So... maybe go back and check your test data - or run it with some other test data?
After all - for a Union to be empty - that suggests that both your input lists are also empty?
A slightly simpler way:
it does preserve the original order
it ignores dupes as it finds them
Uses a link extension method:
formatIssues.Union(groupIssues).DistinctBy(x => x.ColumnIndex)
This is the DistinctBy lambda method from MoreLinq
public static IEnumerable<TSource> DistinctBy<TSource, TKey>
(this IEnumerable<TSource> source, Func<TSource, TKey> keySelector)
{
HashSet<TKey> knownKeys = new HashSet<TKey>();
foreach (TSource element in source)
{
if (knownKeys.Add(keySelector(element)))
{
yield return element;
}
}
}
Would the Linq Except method not do it for you?
var formatIssues = issues.Where(i => i.IsFormatError == true);
var groupIssues = issues.Where(i => i.IsGroupError == true);
var dupeIssues = issues.Where(i => issues.Except(new List<Issue> {i})
.Any(x => x.ColumnIndex == i.ColumnIndex));
var filteredIssues = formatIssues.Union(groupIssues).Except(dupeIssues);

Categories

Resources